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PREFACE  (1985) 


Are  we  intelligent  enough  to  understand  intelligence?  One  approach  to 
answering  this  question  is  “artificial  intelligence,”  the  field  of  computer  sci¬ 
ence  that  studies  how  machines  can  be  made  to  act  intelligently.  This  book  is 
intended  to  be  a  general  introduction  to  artificial  intelligence  (AI).  The  sub¬ 
jects  for  discussion  are  machines  that  can  solve  problems,  play  games,  recog¬ 
nize  patterns,  prove  mathematical  theorems,  understand  English,  and  even 
demonstrate  learning  by  changing  their  own  behavior  to  perform  such  tasks 
more  successfully.  In  general  this  book  is  addressed  to  all  persons  who  are 
interested  in  studying  the  nature  of  thought,  and  hopefully  much  of  it  can  be 
read  without  previous  formal  exposure  to  computers. 

In  this  book,  I  try  to  describe  the  major  experiments  that  have  already 
been  performed  and  to  indicate  some  of  the  open  questions  that  still  need  re¬ 
search  in  the  field  of  artificial  intelligence.  To  this  end,  the  exercises  that 
conclude  each  chapter  were  designed  not  only  to  give  students  some  prac¬ 
tice  in  the  subjects  discussed  explicitly,  but  also  to  direct  them  toward  other 
subjects  that,  for  want  of  space,  could  not  be  discussed.  The  exercises  were 
designed  to  flex  the  students’  own  intelligence,  as  well  as  to  help  develop 
machine  intelligence. 

However,  artificial  intelligence  can  and  should  be  studied  in  ways  that 
are  not  strictly  technical.  It  is  important  for  us  to  realize  how  this  science  is 
related  to  the  hopes  (and  fears)  of  humanity.  To  do  this  we  must  try  to  under¬ 
stand  people,  not  just  machines.  If  artificial  intelligence  is  to  be  developed 
beneficially,  it  will  have  to  become  one  of  our  most  humanistic  sciences. 
Happily,  there  is  a  vast  body  of  literature  (mostly  science  fiction)  that  can 
provide  a  sample  of  nontechnical  thinking  about  AI.  There  are  also  some 
excellent  motion  pictures  (especially  the  Star  Wars  series)  that  provide  a 
vision  of  what  AI  might  someday  produce, 
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PREFACE  (1985) 


Much  progress  has  been  made  in  the  field  of  artificial  intelligence  since 
this  book  was  first  published  in  1974.  The  book  as  originally  written,  how¬ 
ever,  remains  a  good  general  introduction  to  AI,  since  the  foundations  of  the 
field  remain  the  same.  Even  so,  this  edition  will  be  more  useful  because  of 
material  I  have  added  to  summarize  the  decade’s  progress  and  guide  the 
reader  to  further  study.  For  simplicity,  this  supplementary  material,  includ¬ 
ing  its  own  bibliography,  is  added  as  a  separate  section  immediately  follow¬ 
ing  this  Preface. 

In  retrospect,  the  following  people  have  to  date  made  the  greatest  con¬ 
tributions  to  my  work,  and  so  either  directly  or  indirectly  to  this  book:  J,  Mc¬ 
Carthy,  A.  Samuel,  N.  Chapin,  E.  Deaton,  B.  Raphael,  J,  Munson, 
R.  Manuck,  E.  Feigenbaum,  J.  Lederberg,  T.  Rindfleisch,  R.  Scroggs, 
M.  Uren,  I.  Laasi,  R.  Champion,  S.  Sickel,  I.  Pohl,  M.  Cunningham, 
H.  Crafts,  B.  Chatterjee,  J.  A.  Wheeler,  W.  Honig,  D.  Lenat,  R.  Schuet, 
E.  McGinnis,  R  Roth,  D.  B.  Pedersen,  B.  A.  Bowman. 

I  am  grateful  to  all  of  these  people  and  have  benefited  from  their  advice. 

It  should  be  expressly  noted  that  I  alone  am  responsible  for  the  content 
of  this  book.  Naturally,  I  hope  the  reader  will  find  that  its  value  greatly  out¬ 
weighs  its  errors,  and  I  apologize  for  any  errors  it  contains. 

Finally,  a  special  word  of  thanks  goes  to  my  parents,  whose  faith  and 
encouragement  have  made  this  effort  possible. 


Philip  c.  Jackson,  jr. 


DEVELOPMENTS 

1974-1984 


This  supplementary  section  presents  material  updating  the  text  of  the 
first  edition  of  Introduction  to  Artificial  Intelligence,  which  has  been  kept  in¬ 
tact.  The  new  material  is  organized  to  parallel  the  coverage  of  the  original 
chapters,  and  readers  unfamiliar  with  the  first  edition  may  wish  to  read  each 
original  chapter  first,  then  turn  to  the  supplementary  section  for  that 
chapter. 

In  some  cases,  references  are  made  to  the  original  chapters  by  using  just 
chapter  or  page  numbers.  Parentheses  enclose  the  year  portion  of  references 
to  entries  in  the  original  Bibliography  at  the  end  of  the  book,  while  square 
brackets  are  used  in  references  to  entries  in  the  Supplementary  Bibliography 
at  the  end  of  this  hew  section. 

It  should  be  emphasized  that  this  new  material  is  only  an  introduction 
to  AI  research  for  the  decade  1974-1984,  just  as  the  original  text  of  this  book 
is  only  an  introduction  to  AI  research  up  to  1974.  Because  of  space  limita¬ 
tions,  the  supplementary  material  cannot  discuss  the  decade’s  research  in 
thorough  detail.  Rather,  it  tries  to  summarize  and  give  pointers  to  some  of 
the  decade’s  major  research.  Hopefully,  the  reader  will  follow  these  pointers 
to  gain  greater  ktlowledge  of  the  entire  field,  including  research  not 
cited. 


1.  INTRODUCTION 

Much  of  the  material  in  Chapter  1  needs  very  little  updating.  Recently, 
however,  several  books  have  been  published  that  contain  material  relevant 
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to  the  coverage  in  Chapter  1,  Among  these,  the  reader  is  referred  to  Albus 
[1981],  Boden  [1977],  Kent  [1981],  and  Sagan  [1977]. 

Also,  it  should  be  noted  that  since  1974  several  books  have  been  writ¬ 
ten  that  are  complete  texts  on  artificial  intelligence,  with  various  levels  of 
coverage  and  emphasis.  The  reader  is  encouraged  to  study  such  texts,  in¬ 
cluding  Banerji  [1980],  Barr,  Cohen,  and  Feigenbaum  [1981],  Bellman 
[1978],  Nilsson  [1980],  Raphael  [1976],  and  Winston  [1977].  Other,  more 
specialized  texts  are  cited  below. 

In  general,  the  proceedings  of  conferences  on  artificial  intelligence  are 
the  major  sources  cited  in  these  supplementary  notes;  the  reader  can  find 
detailed  summaries  of  most  published  AI  research  in  the  Proceedings  of  the 
American  Association  for  AI  and  International  Joint  Conferences  on  AI, 
which  now  run  to  7,121  pages,  spanning  the  years  1969  to  1983. 


2.  MATHEMATICS,  PHENOMENA,  MACHINES 


The  goal  of  Chapter  2  was  to  present  some  of  the  mathematical  theory 
underlying  artificial  intelligence  and  computer  science  in  general.  In  partic¬ 
ular  I  discussed  whether  there  was  any  way  in  theory  of  proving  mathemati¬ 
cally  that  machines  could  or  could  not  be  intelligent.  In  addition,  I  pre¬ 
sented  some  practical  limitations  that  affect  computers  because  they  are 
real-world  machines  subject  to  the  laws  of  physics.  These  results  from  math¬ 
ematics  and  physics  are  useful  in  reasoning  about  computers  and  the  limita¬ 
tions  of  artificial  intelligence,  but  not  in  themselves  sufficient  to  prove  or 
disprove  the  attainability  of  true  artificial  intelligence. 

Naturally,  scientists  have  continued  to  discuss  this  question,  arguing 
both  for  and  against  the  ultimate  achievability  of  true  intelligence  by  com¬ 
puters.  And  into  this  debate  they  have  introduced  considerations  from  other 
sciences. 

Regarding  the  general  theoretical  limitations  of  artificial  intelligence, 
Haugeland  [MD;  198 1]  includes  several  papers  arguing  against  the  possibili¬ 
ty  of  a  tmly  complete  artificial  intelligence,  one  that  could  duplicate  or  sur¬ 
pass  human  thought,  as  well  as  other  papers  that  discuss  AI  methodology 
but  are  not  skeptical  of  its  ultimate  success.  (The  scope  of  this  collection 
makes  it  an  important  AI  reference.)  The  arguments  against  AI  (by  Dreyfus, 
Haugeland,  Searle,  Davidson,  and  others)  draw  on  relevant  issues  in  the 
fields  of  psychology,  philosophy,  and  biology. 

They  argue  that  computers  cannot  duplicate  the  biochemistry  of  the 
human  brain,  which  prevents  AI  from  duplicating  moods,  emotions,  aware¬ 
ness,  feelings,  and  other  phenomena  important  to  human  thought.  Also, 
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they  argue  that  “understanding”  concepts  is  fundamentally  different  from 
symbol  manipulation;  that  sensorimotor  (and  other)  skills  are  not  devel¬ 
oped  by  thought  processes  such  as  those  studied  by  AI  and  its  sister  field, 
cognitive  psychology;  that  human  thought  is  “holistic”  and  cannot  be  divid¬ 
ed  into  subprocesses  in  the  way  that  AI  approaches  it;  that  human  thought 
deals  with  infinite  exceptions  and  ambiguities  and  thus  is  too  complex  for 
computers.  (I  do  not  say  that  each  of  the  authors  listed  above  subscribes  to 
all  of  these  claims.) 

I  alluded  to  some  of  these  concerns  myself  in  Chapter  2,  for  example  by 
noting  that  the  universe  might  contain  phenomena  which  are  not  finitely 
describable,  and  that  the  human  brain  is  architecturally  different  from  pres¬ 
ent  computers.  Because  of  this  I  concluded  that  it  is  an  open  question 
whether  computers  could  ever  duplicate  all  the  abilities  of  human  intelli¬ 
gence,  though  it  seems  clear  they  can  emulate  some. 

The  argument  that  understanding  is  fundamentally  different  from  sym¬ 
bol  manipulation,  however,  is  particularly  crucial  to  AI  research,  since  a 
major  approach  of  AI  has  been  to  apply  discrete  symbol-manipulation  tech¬ 
niques  (via  digital  computers)  to  tasks  which  in  humans  involve  under¬ 
standing.”  For  example,  AI  programs  have  been  written  that  “understand” 
sentences  in  English  and  other  natural  languages  (see  Chapter  7  and  its  sup¬ 
plement,  below). 

Searle  [198 1]  gives  an  especially  clear  argument  that  symbol  manipula¬ 
tion  cannot  be  equivalent  to  human  understanding,  using  a  variation  of 
Turing’s  test  (see  Chapter  1).  In  essence,  Searle  argues  that  a  human  could 
perform  a  computer’s  symbol-manipulation  procedures,  and  appear  tO  un¬ 
derstand  a  foreign  language,  without  actually  understanding  the  language  at 
all.  Searle  asks  the  reader  to  agree  through  introspection  that  symbol  manip¬ 
ulation  is  qualitatively  different  from  true  “understandiiig.” 

Recent  papers  by  McDermott  [  1983]  and  Woods  [  1 983]  counter  this  ar¬ 
gument,  basically  by  contending  that  understanding  really  is  a  process  of 
symbol  manipulation:  they  contend  in  essence  that  understanding  is  a  proc¬ 
ess  that  deals  symbolically  with  “meaning  rules”  Which  represent  interpreta¬ 
tions  of  other  symbols.  Sloman  [1983]  suggests  that  the  built-in  interpreta¬ 
tions  for  truth,  conditionality,  numbers,  etc:,  that  are  provided  by  the 
machine  languages  of  computers  furnish  a  starting  point  for  studying  how 
other  machine  architectures  can  provide  interpretations  of  concepts. 

I  think  these  responses  are  adequate  to  show  that  computers  can  emu¬ 
late  understanding,  i.e.,  at  least  behave  as  though  they  understand  concepts 
within  some  limitations  of  scope  and  complexity.  This  should  be  adequate 
for  AI  systems  to  achieve  useful  results,  so  that  the  general  public  will  collo¬ 
quially  say  that  AI  systems  “understand”  some  concepts,  though  scientists 
should  remain  cautious  in  their  comparisons. 
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Whether  computers  can  ‘‘really”  understand  concepts  just  as  we  do  will 
require  more  understanding  of  human  intelligence  to  decide.  Perhaps 
human  intelligence  “internalizes”  its  symbolic  manipulation  at  a  lower  level 
of  brain  functioning  to  produce  the  sensations  of  human  “understanding,” 
“consciousness,”  etc.,  and  these  sensations  are  not  duplicated  when  humans 
process  symbols  consciously,  as  in  Searle’s  variation  of  Turing’s  test.  This 
seems  to  be  the  essence  of  the  responses  of  Sloman,  McDermott,  and  Woods 
to  Searle’s  argument. 

Even  so,  Dreyfus  [1981]  notes  that  Husserl  and  Heidegger  encountered 
an  apparently  endless  task  in  their  attempts  to  define  human  concepts  sym- 
bolically,  and  warns  that  AI  confronts  the  same  problem.  Undaunted  by 
such  problems,  I  have  proposed  a  high-level  initial  design  for  a  system  that 
would  develop  its  own  concepts  to  demonstrate  general-purpose  artificial 
intelligence  in  real-world  environments  (Jackson  [1984]). 

Regarding  the  physical  limitations  of  real-world  computers,  the  limits 
described  in  Chapter  2  still  apply,  of  course.  Engineering  progress  has  in¬ 
creased  memory  sizes  and  reduced  access  and  eomputation  times,  generally 
by  an  order  of  magnitude  over  the  “conventional”  and  “attainable”  numbers 
given  in  Chapter  2.  This  progress  continues  rapidly,  so  that  specific  numbers 
cited  at  the  time  this  is  written  would  be  obsolete  by  the  time  this  reaches  the 
reader.  What  is  important  is  simply  that  these  finite  limitations  still  apply, 
constraining  the  computing  power  of  machines.  It  should  be  noted  that  en¬ 
gineers  are  still  eons  away  from  the  “theoretical”  limits  to  information  pro¬ 
cessing  based  on  quantum  theory,  presented  in  Chapter  2.  Also,  the  process¬ 
ing  rates  predicted  for  “coherent  optical  logic”  (Culver  and  Mehran,  1971) 
are  as  yet  unfulfilled. 

Again,  these  limits  in  themselves  do  not  answer  the  question  of  AI’s  ulti¬ 
mate  attainability.  However,  one  development  deserves  special  attention: 
the  integrated-circuit  “microcomputers”  (mentioned  on  page  60)  have 
evolved  spectacularly,  so  that  rather  complex  systems  now  occupy  very  little 
space.  For  example,  up  to  450,000  transistors  can  be  placed  on  a  ’A-inch- 
square  chip”  (Beyers  et  al.  [1983]).  The  evolution  of  microcomputers  holds 
great  promise  for  artificial  intelligence,  especially  in  parallel-processing  sys¬ 
tems.  (Regarding  developments  in  this  field,  see  the  supplement  to  Chapter 
8,  below.) 

Finally,  it  should  be  noted  that  Hofstadter  [1980]  and  Rucker  [1982] 
present  exuberant,  insightful  introductions  to  subjects  in  mathematical 
logic  that  underlie  the  field  of  artificial  intelligence.  In  particular,  they  treat 
Godel’s  incompleteness  theorem  regarding  unsolvable  problems  in  mathe¬ 
matical  logic.  The  reader  may  wish  to  compare  their  expositions  of  this 
topic,  and  its  relation  to  AI,  to  Chapter  2’s  presentation  of  the  Halting  Prob¬ 
lem,  a  related  unsolvable  problem  that  fundamentally  limits  the  abilities  of 
machines.  Again,  such  unsolvable  logic  problems  do  not  limit  machines  any 
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more  than  they  limit  people.  It  is  interesting,  however,  that  the  recursive  way 
in  which  these  problems  are  stated  reminds  us  of  the  question,  “Are  we  intel¬ 
ligent  enough  to  understand  intelligence?” 

Though  AI  has  made  enormous  progress  in  the  last  decade,  understand¬ 
ing  intelligence  remains  an  unsolved  challenge  to  our  intelligence.  The 
progress  in  AI  so  far  has  also  increased  our  awareness  of  how  much  we  do  not 
know.  Whether  or  not  machines  can  ever  be  truly  intelligent,  however,  AI  re¬ 
search  has  shown  that  even  limited  forms  of  machine  intelligence  have  great 
utility. 


3.  PROBLEM  SOLVING 

This  chapter  discusses  concepts  of  “problem  solving”  that  are  central  to 
AI  research.  As  would  be  expected,  AI  researchers  have  continued  to  develop 
and  explore  these  concepts,  which  retain  their  centrality. 

One  major  concept  for  problem  solving  remains  “heuristic  search  in 
the,  “state-space”  paradigm.  Nilsson  [1980]  and  Barr  et  al.  [1981]  give  thor¬ 
ough  presentations  of  this  subject  that  complement  the  treatment  given  in 
Chapter  3.  Dechter  and  Pearl  [1983]  present  recent  theoretical  results  on  the 
Hart-Nilsson-Raphael  (1968)  heuristic  search  algorithm  (known  as  the  A* 
algorithm)  that  further  support  the  optimality  of  this  search  algorithm. 
Pearl  and  Kim  [1982]  and  Ghallab  and  Allard  [1983]  show  the  value  of  A* 
variations  that  search  for  solutions  that  are  “nearly”  optimal  instead  of  com¬ 
pletely  optimal. 

Recently,  Kumar  and  Kanal  [1983]  have  shown  that  a  variety  of 
problem-solving  algorithms  can  be  “unified”  via  representation  with 
context-free  grammars  as  “composite  decision  processes.”  (See  Chapter  7  re¬ 
garding  context-free  grammars.)  Stockman  [1979]  and  Berliner  [1979]  pre¬ 
sent  recent  search  algorithms  for  AND/OR  trees,  which  Kumar  and  Kanal 
have  shown  are  closely  related  to  the  alpha-beta  procedure  described  in 

Chapter  4.  „  •  •  ^  c 

Heuristic  search  procedures  require  algorithms,  called  “heuristics,  for 
estimating  the  values  of  nodes  in  the  state  space  being  searched.  Valtorta 
[1983],  Pearl  [1982],  and  Gashnig  [1979]  have  shown  that  heuristic-estimate 
functions  can  be  automatically  derived  by  a  search  procedure  that  solves 
“auxiliary”  problems  related  to  the  original  problem  state  space.  However, 
Valtorta  also  shows  it  is  not  efficient,  in  general,  to  use  this  method.  The  de¬ 
velopment  of  good  heuristic-estimate  functions  remains  a  key  problem  for 
AI  research:  experience  indicates  that  this  is  an  area  where  the  problem  of 
problem  representation  remains  central  (see  Chapter  3).  AI  researchers  are 
still  at  the  frontier  of  developing  systems  that  can  develop  their  own  prob- 
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lem  representations.  (See  Amarel  (1968),  Lenat  [1982],  Lenat  and  Brown 
[1983],  and  Jackson  [1984].) 

However,  AI  researchers  have  long  recognized  the  centrality  of  the  rep¬ 
resentation  problem,  and  in  the  decade  1974-1984  concentrated  substan- 
dally  on  machine  representations  of  all  forms  of  human  knowledge  (includ¬ 
ing  problems).  “Knowledge  representation”  has  become  a  major  subdomain 
of  AI  research.  Among  the  multitude  of  papers  On  this  subject,  the  reader 
should  certainly  be  pointed  to  Minsky  [1982],  Schank  and  Colby  [1973], 
Lenat  [1979],  and  Lenat  and  Greiner  [1980]. 

Lenat  s  work  in  particular  should  be  briefly  described,  because  it  dem¬ 
onstrated  major  progress  in  the  past  decade.  Lenat  [1979]  describes  a  com¬ 
puter  program  called  AM,  which  used  a  LISP-based  representation  to  emu¬ 
late  discoveries  of  concepts  in  elementary  mathematics.  For  example, 
starting  with  LISP-structures  to  represent  concepts  of  set  theory  (“set,” 
“union,”  etc.),  AM  used  heuristics  to  develop  LISP  structures  representing 
higher-level  concepts  of  elementary  mathematics  (“natural  number,” 
“prime,”  etc.).  AM  could  also  discover  conjectures  (relations  between  con¬ 
cepts),  though  it  was  not  designed  to  prove  theorems.  For  example,  AM  was 
able  to  conjecture  the  unique  factorization  theorem,  that  any  natural  num¬ 
ber  can  be  uniquely  factored  into  prime  numbers.  Lenat  and  Greiner  [1980] 
describe  the  evolution  of  this  approach  into  “RLL,”  a  “representation- 
language  language”  used  for  representing  concepts  in  arbitrary  domains  be¬ 
sides  mathematics.  Lenat  and  Brown  [1983]  give  a  recent  analysis  of  this 
research. 

Besides  heuristics  and  the  representation  problem,  the  concepts  of 
planning,  evolution  vs.  reason  in  problem  solving,  analogies,  learning,  and 
“skilled”  (or  “expert”)  problem  solvers  all  remain  central  to  AI.  Researchers 
have  continued  to  develop  these  topics,  in  many  cases  combining  them  (see, 
for  example,  Rendell  [1983],  Subrahmanian  [1983],  Salzberg  [1983], 
Mostow  [1983a],  Carbonell  [1983],  Georgeff  [1983],  and  Kim  and 
McDermott  [1983]).  “Learning”  especially  has  been  a  major  AI  research 
topic,  with  a  recent  book  surveying  the  subject  (Michalski,  Carbonell,  and 
Mitchell  [1983]).  The  link  between  learning  and  knowledge  representation 
is  analyzed  in  a  recent  paper  by  Scott  [1983].  Lebowitz  [1983]  illustrates  this 
topic  in  a  system  which  generalizes  representations  of  patent  abstracts. 
Burstein  [1983]  and  Douglas  and  Moran  [1983]  present  results  on  learning 
by  analogies. 

Finally,  AI  has  continued  to  make  impressive  progress  in  developing 
expert  systems  that  can  demonstrate  skill  in  performing  tasks  previously 
requiring  trained  human  intelligence.  As  a  result,  “expert  systems”  have  be¬ 
come  another  major  subdomain  of  AI  research.  The  main  tools  for  con¬ 
structing  expert  systems  have  been  “knowledge-representation”  systems, 
typically  using  “production  rules”  and  “backtracking”  (Chapters  3,  4,  6)  to 
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control  how  knowledge  is  used  by  the  expert  system.  A  textbook  edited  by 
Hayes-Roth,  Waterman,  and  Lenat'[1983]  provides  a  standard  reference  on 
theories  and  techniques  for  building  expert  systems.  Benjamin  and  Harri¬ 
son  [1983]  describe  work  on  a  system  that  can  learn  its  expert  behavior  by 
generalizing  from  examples  of  expert  human  behavior.  The  1983 IJCAI  and 
AAAI  Proceedings  contain  55  papers  explicitly  on  expert  systems,  with  many 
others  indirectly  related,  which  indicates  the  magnitude  of  research  in  this 
area. 

In  particular,  expert  systems  are  envisioned  as  augmenting  (and  m 
some  cases  supplanting)  human  intelligence  in  development  of  the  fifth 
generation”  of  computers  (Feigenbaum  and  McCorduck  [1983]).  In  turn, 
the  fifth  generation  of  computers  should  support  even  more  advanced  AI 
processes,  including  expert  systems.  Lenat  et  al.  [1983]  describe  one  such 
ambitious,  long-range  project,  called  Knoesphere;  an  expert  system  able  to 
represent,  and  intelligently  explain,  the  knowledge  of  an  encyclopedia. 
Jackson  [1984]  presents  a  similarly  ambitious,  high-level  design  of  a  system 
that  would  develop  its  own  concepts,  demonstrating  general-purpose  intelli¬ 
gence  in  real-world  environments. 


4.  GAME  PLAYING 

Game  playing  has  remained  a  valuable  subfield  of  AI  research,  and  the 
concepts  presented  in  Chapter  4  have  remained  central  to  computer  pro¬ 
grams  that  play  games.  Games  have  been  useful  in,  and  have  benefited  from, 
advances  in  other  AI  fields,  such  as  knowledge  representation  and  evolu¬ 
tionary  programs. 

Zhang  and  Zhang  [1983]  discuss  application  of  the  statistical-inference 
method  to  the  A*  heufistic  search  algorithm,  claiming  it  results  in  a  superior 
game-search  algorithm.  Other  important  work  on  search  algorithms  iS  men¬ 
tioned  in  the  supplement  to  Chapter  3,  above. 

Chess  has  continued  to  be  a  major  focus  of  AI  research  in  game  playing. 
As  of  this  writing,  computers  cannot  yet  consistently  win  against  human 
grandmasters.  Berliner  [1981]  discusses  the  use  of  brute-force  search  tech¬ 
niques  in  Chess  programs,  which  in  conjunction  with  supercomputers  are 
presently  the  best  AI  Chess  systems.  Kaindl  [1983]  discusses  a  more  in¬ 
formed  (relying  less  on  brute  force)  Chess  search  for  “quiescent”  states  of 
the  game.  Simon  and  Gilmartin  [1974]  describe  an  earlier  Chess  program 
which  recognized  some  “patterns,”  or  related  groups  of  Chess  pieces. 

Some  AI  researchers  have  concentrated  on  Chess  endgames,  which  are 
relatively  simpler  than  full  Chess.  For  example,  Bramer  [1975]  studied 
knowledge  representation  in  Chess  endgames.  Building  on  this,  van  den 
Herik  [1983]  describes  an  expert  system  for  a  class  of  Chess  endgames 


XXll 


INTRODUCTION  TO  ARTIFICIAL  INTELLIGENCE 


(King,  Bishop,  and  Knight  vs.  King).  Systems  like  this  use  patterns  to  guide 
the  selection  of  rules  for  actions,  in  addition  to  depth-first  search.  Campbell 
and  Berliner  [1983]  describe  a  similar  program  for  King  and  Pawn 
endgames,  Jackson  [1984]  discusses  some  of  the  “conceptual  structures” 
that  might  be  used  in  representing  various  levels  of  knowledge  about  Chess. 

S.  F.  Smith  [1983]  describes  a  genetic  algorithm  (evolutionary  pro¬ 
gram)  that  develops  its  own  production  rules  for  playing  Poker  and  learns  to 
play  at  the  same  level  as  Waterman’s  program,  described  in  Chapter  4. 
Rendell  [1983,  1984]  studied  genetic  algorithms  for  learning  heuristic- 
search  evaluation  functions,  relating  this  to  Samuel’s  Checkers  program 
(1967).  ^  ^ 

Games  have  even  helped  in  studying  the  evolution  of  knowledge.  For 
example,  Hunt  [1983]  discusses  the  game  of  Mastermind,  in  which  one  play¬ 
er  tries  to  break  a  four-color  code  selected  by  another,  and  shows  that  guess¬ 
es  known  to  be  false  can  at  some  points  give  more  information  toward  break¬ 
ing  the  code  than  guesses  that  are  not  known  to  be  false.  Hunt  relates  this  to 
a  similar  problem  in  decoding  DNA  strings. 

In  summary,  it  seems  clear  that  games  will  remain  a  valuable  subfield 
of  AI  research,  with  the  potential  for  shedding  light  on  and  testing  AI  results 
in  other  domains.  In  particular,  Chess  will  remain  a  challenge  to  AI  research¬ 
ers,  providing  a  domain  in  which  efforts  can  be  focused  on  expert  systems, 
knowledge  representation,  and  learning. 


5.  PATTERN  PERCEPTION 


Chapter  5  discusses  “pattern  perception”  as  it  occurs  in  AI  vision  sys¬ 
tems,  and  also  more  generally  as  it  occurs  in  other  domains  of  AI.  The 
growth  of  robotics  has  meant  continued,  extensive  research  into  vision  sys¬ 
tems,  though  the  principles  described  in  Chapter  5  remain  basic  to  more  re¬ 
cent  research.  Research  in  pattern  perception  has  also  benefited  from  work 
in  other  AI  domains,  such  as  knowledge  representation,  production-rule  sys¬ 
tems,  etc.  The  following  indicates  just  some  of  the  recent  research  in  this 
area. 

Brooks  [1981]  describes  a  high-level  vision  system  (called  ACRONYM) 
that  relies  on  models  of  objects  and  of  the  scene- to-image  transformation  to 
predict  how  an  object  will  appear,  given  the  program’s  knowledge  represen¬ 
tation  of  a  scene.  Fisher  [1983]  extends  this  approach  in  a  program  that 
matches  regions  of  a  picture  image  to  models  of  surfaces,  and  then  hypothe¬ 
sizes  possibly  occluded  objects  in  a  scene.  Glicksman  [1983]  describes  an¬ 
other  system  which  uses  “semantic  information  structures”  (see  Chapter  7) 
for  visual-information  processing. 
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Diamond  et  al.  [1983]  describe  an  edge-detection  method  that  is  ori- 
ented  toward  parallel  processing.  (Parallel-processing  computer  architec¬ 
tures  have  made  substantial  progress  in  the  past  decade,  see  the  supplemen 
tary  material  for  Chapter  8,  below.)  Fischler  and  Wolf  [1983]  describe  the 
iterative  use  of  “smoothing”  algorithms  to  identify  lines  in  a  picture.  Of 
course,  numerous  other  researchers  have  addressed  these  topics. 

Horn  [1977]  gives  an  important  analysis  of  the  physics  ofimage  forma¬ 
tion  and  its  relation  to  visual  perception.  Several  authors  have  studied  the 
recognition  of  object  shapes  using  shading,  texture,  and  stereo  visual  infor¬ 
mation  (see,  for  example,  Ikeuchi  and  Horn  [1981],  Witkin  [1981],  and 
Crimson  [1981]). 

Pentland  [1983]  discusses  how  to  compute  “fractal”  representations  of 
patterns  from  visual  image  data.  This  approach  may  be  especially  important 
in  the  visual  perception  of  real-world  scenes,  which  often  include  complexi¬ 
ties  such  as  mountains,  trees,  clouds,  etc.  Fractals  are  a  very  interesting  class 
of  mathematical  functions  developed  by  Mandelbrot  [1977].  The  basic  con¬ 
cept  of  fractals  is  that  objects  have  different  shapes  depending  On  the  scale  at 
which  they  are  measured.  For  example,  a  mountain  may  appear  to  be  round¬ 
ed  from  a  distant  viewpoint  and  more  irregular  from  a  closer  viewpoint.  The 
success  of  fractals  in  representing  visual  patterns  suggests  that  they  should 
be  investigated  for  pattern  perception  in  other  AI  domains. 

Nagel  [1983]  gives  some  recent  mathematical  results  relevant  to  visual 
perception  of  motion,  and  several  other  references  to  this  problem.  Cowie 
[1983]  also  discusses  this  topic,  as  well  as  other  relations  between  observers 
and  the  observed.  Thorpe  and  Shafer  [1983]  discuss  this  topic  in  relation  to 
Huffman-Glowes  labeling  algorithms. 

Of  course  the  above  material  can  reference  only  a  small  subset  of  the  re¬ 
search  in  pattern  perception.  In  addition  to  the  various  conference  proceed¬ 
ings  (IJCAI8  alone  has  over  40  papers  on  vision  systems),  the  reader  is  espe¬ 
cially  encouraged  to  study  the  following  texts:  Ahuja  and  Schachter  [1983], 
Barr  et  al.  [1981],Flanson  and  Riseman  [1978],  Kandel  [1982],  Miller  and 
Johnson-Laird  [1976],  Nevada  [1982],  Pavlidis  [1977],  Pugh  [1983],  Rock 
[1975],  Tanimoto  and  Klinger  [1980],  Ullman  [1979],  and  Winston  [1975, 
1977].  Mackworth  [1983]  gives  an  excellent  overview  of  the  past  decade’s 
work  on  computer  vision. 


6.  THEOREM  PROVING 


Researchers  have  also  remained  very  active  in  the  study  of  theorem¬ 
proving  techniques  and  in  using  such  techniques  in  AI  systems.  Wos  [1983] 
describes  one  of  the  most  successful  theorem-proving  systems,  an  “automat- 
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ed  reasoning  assistant”  (called  AURA)  which  was  successfully  used  to  an¬ 
swer  some  previously  open  questions  in  mathematics  and  formal  logic. 
AURA  has  also  been  used  for  design  and  validation  of  logic  circuits. 

Resolution-based  theorem  provers  have  continued  to  be  of  interest. 
Kowalski  [1975],  Sickel  [1976],  and  Stickel  [1982]  describe  methods  to 
make  resolution  more  efficient  by  storing  unifications  in  “connection 
graphs.  Stickel  [1983]  describes  a  recent  resolution-based  theorem-proving 
system,  with  comparisons  to  other  variations  of  resolution.  IJCAI8  presents 
several  other  papers  on  resolution. 

AI  researchers  have  become  very  interested  in  using  theorem-based  sys¬ 
tems  to  represent  various  “higher-level”  reasoning  problems.  Many  have  fo¬ 
cused  on  “nonmonotonic  logic,”  which  allows  reasoning  about  statements 
that  can  have  exceptions  and  might  be  retracted  (e.g.,  “all  birds  can  fly”). 
(See,  for  example,  McDermott  and  Doyle  [1980],  McCarthy  [  1980],  Reiter 
■  [1980],  and  Moore  [1983].)  This  is  closely  related  to  “default  reasoning,”  in 
which  statements  are  accepted  by  default  (see,  for  example,  Wihograd 
[1980],  Rich  [1983],  and  Nutter  [1983].)  McCarthy  [1979]  has  continued  to 
study  the  logic  of  reasoning  about  knowledge  and  action.  (See  also  Moore 
[1979].)  IJCAI8  has  numerous  other  papers  related  to  such  logical  systems. 
The  reader  should  also  consult  a  text  edited  by  Mandani  and  Gaines  [1981] 
on  “fuzzy  reasoning,”  which  includes  recent  papers  on  the  field  originated 
by  Zadeh  (1965,  1968). 

The  supplement  to  Chapter  3,  above,  describes  the  work  of  Lenat,  who 
studied  the  development  of  concepts  in  mathematical  theories  and  has  been 
a  leader  in  work  on  expert  systems.  As  Nilsson  [1984]  points  out,  much  of 
the  work  on  expert  systems  can  be  viewed  as  an  application  of  theorem  prov¬ 
ing.  most  expert  systems  make  use  of  production  rules  and  can  be  viewed  as 
“backward-chaining  theorem  provers.”  An  important  development  has 
been  the  use  of  production-rule  systems  to  represent  the  “meta-level”  con¬ 
trol  structures  of  expert  systems,  in  addition  to  representing  expert  knowl¬ 
edge  within  those  systems  (Genesereth  and  Smith  [  1 982]).  Another  achieve¬ 
ment  is  the  creation  of  a  widely  accepted  programming  language  (PROLOG) 
for  implementing  production-rule  systems,  which  has  become  a  major  com¬ 
petitor  of  LISP  in  AI  research  efforts  (Colmeraurer  [1975],  Warren  et  al. 
[1977],  and  Kowalski  [1979]).  Much  ofthis  work  can  be  related  to  the  earlier 
work  of  Hewitt  (1972)  described  in  Chapter  6.  Weyhrauch  [1980]  developed 
a  theory  of  “semantic  attachments”  between  propositional  and  procedural 
systems. 

Work  has  also  continued  on  “automatic  programming”  and  its  relation 
to  theorem  proving.  Though  in  many  ways  automatic  programming  remains 
one  of  AI’s  most  difficult  challenges,  advances  have  been  made.  Barstow 
[1977]  describes  a  knowledge-based  system  for  automatic  programming, 
which  is  an  important  step.  Guiho  [1983]  illustrates  the  state  of  the  art  in 
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proving  that  programs  are  correct.  Boyer  and  Moore  [1983]  use  theorem 
proving  to  show  the  correctness  of  the  RSA  encryption  algorithm.  Reddy 
and  Jayaraman  [1983]  and  Naqvi  and  Henschen  [1983]  give  recent  exam¬ 
ples  of  methods  for  transforming  mathematical  problems  into  programs  that 
solve  them.  Balzer  [1981]  and  others  have  developed  a  high-level  language 
(GIST)  for  specifying  behavior  of  programs,  and  have  studied  the  conversion 
of  specifications  into  programs.  D.  R.  Smith  [1983]  has  presented  another 
interesting  paper  on  transformation  of  specifications  into  programs,  using  a 
“problem-reduction”  approach.  Mostow  [1983b]  describes  the  transforma¬ 
tion  of  specifications  into  VLSI  circuits. 

7.  SEMANTIC  INFORMATION  PROCESSING 

Chapter  7  concerns  the  ability  of  machines  to  use  languages,  and  in 
particular  to  process  the  “semantic  information”  (i.e.,  “meaning”)  of  sen¬ 
tences  in  languages.  AI  has  continued  its  progress  in  the  area,  of  course, 
building  on  the  research  summarized  in  Chapter  7. 

One  area  of  progress  has  been  in  the  study  of  how  “semantic  informa¬ 
tion”  can  be  represented  for  processing  by  computers.  This  study  is  also 
called  “knowledge  representation”  and  has  become  a  major  subfield  of  AI. 
The  supplementary  material  for  Chapter  3,  above,  summarizes  AI  s  progress 

in  the  field  of  “knowledge  representation”  in  general. 

Many  results  have  been  obtained  for  the  “syntax  problem,  that  is,  how 
computers  can  be  made  to  parse  natural  (and  artificial)  languages.  To  men¬ 
tion  just  a  few:  Kay  [1980]  describes  a  flexible  method  for  defiriing  non- 
determihistic  parsers,  called  the  “active  chart  parsing”  method,  Marcus 
[1980]  shows  thit  LRf^  parsing  can  be  extended  to  give  deterministic,  effi¬ 
cient  parsing  of  English  (see  also  Stabler  [  1983]).  Gazdar  [1983]  reasons  per¬ 
suasively  that  natural  languages  might  be  represented  by  generalizations  of 

context-free  gfahimars.  , 

Several  researchers  have  studied  how  AI  systems  can  use  knowledge 
about  the  direction  or  context  of  a  conversation  to  aid  in  understanding  sen¬ 
tences  within  the  conversation.  Much  of  this  work  has  relied  on  the  concept 
of  “scripts”  introduced  by  Schank  and  Abelson  [1977]:  a  script  lis  a  stereo¬ 
typical  sequence  of  possible  situations  and  events.  Pazzanf[1983]  gives  a  re¬ 
cent  example  of  an  AI  system  that  uses  scripts  to  communicate  interactively 
with  a  human.  Additional  conceptual  structures  for  representing  dialogues 
have  been  studied  by  Bbbrow  et  al.  [1977].  Still  others  have  studied  the  use 
of  hierarchy  and  recursion  in  structuring  texts  and  dialogues  (McKeown 
[1983],  Grosz  [1977],  and  Reichman  [1981]).  Hayes  and  Carbonell  [1983] 
have  studied  “metalanguage”  phrases  and  sentences,  which  refer  to  other 
phrases  and  sentences  in  the  same  dialogue. 
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Chapter  7  notes  that  the  meaning  of  a  sentence  is  often  more  closely  re¬ 
lated  to  the  speaker’s  goals  than  it  is  to  the  sentence  itself  Earlier  discussions 
of  this  “speech-act  theory”  were  given  by  Austin  [1962]  and  Searle  [1969]. 
Recently,  researchers  have  developed  theories  of  how  systems  can  plan  the 
use  of  language  to  achieve  goals  (Cohen  and  Perrault  [  1 979]).  Appelt  [1982] 
describes  an  AI  system  that  plans  its  generation  of  English  sentences. 
Carberry  [1983]  describes  a  system  that  understands  a  speaker’s  goals  in 
order  to  answer  questions  appropriately. 

Major  progress  has  been  made  in  building  computer  systems  that  can 
recognize  and  synthesize  human  speech.  The  HEARSAY  projects  demon¬ 
strated  AI  s  viability  for  this  task  and  also  showed  the  value  of  interacting 
problem  solvers  for  different  subdomains  of  the  problem  (Erman  et  al. 
[1 980]).  Barr  et  al.  [  1 98 1  ]  survey  work  on  speech  understanding  and  in  par¬ 
ticular  discuss  search  methods  used  in  programs  for  understanding  speech. 
Of  special  interest  is  the  beam-search”  method,  which  was  found  to  be  use¬ 
ful  in  the  HARPY  speech-understanding  system:  beam  search  is  essentially  a 
breadth-first,  nonbacktracking  heuristic  search  that  expands  only  high- 
scoring  nodes  at  each  level,  abandoning  paths  that  encounter  low-scoring 
nodes  (Newell  [1978]),  More  recently,  Huttenlocher  and  Zue  [1983]  de¬ 
scribe  a  set  of  phonological  constraints  that  enable  robust-speech  recogni¬ 
tion.  Teja  and  Gonnella  [1983]  survey  the  technology  of  speech 
synthesis. 

Chapter  7  also  considers  the  advantages  of  interacting  networks,  or  col¬ 
lections,  of  question-answering  systems.  Barr  et  al.  [1981,  p.  343]  write  that 
this  approach,  known  as  the  HEARSAY  architecture  because  of  its  indepen¬ 
dent  use  by  Erman  et  al.  [1980],  has  been  of  great  value  in  AI  systems  for 
many  diverse  applications.  Stanfill  [1983]  has  implemented  a  particularly 
nice  example  of  this  approach  in  a  collection  of  interacting  expert  systems 
that  together  solve  problems  in  simple  mechanics.  The  collection  consists  of 
experts  for  subdomains  of  algebra,  linear  geometry,  solid  geometry,  “shape,” 
mechanics,  pneumatics,  and  qualitative  relations.”  Experts  in  higher-level 
domains  can  access  lower-level  domains  through  queries.  Jackson  [1984]  in¬ 
cludes  and  extends  the  “GQAyHEARSAY  concept  in  a  design  for  a  general- 
purpose  AI  conceptual  context. 

AI  language  understanding  has  progressed  to  the  point  where  it  is  now 
becoming  a  frequent  computer  interface  for  some  applications,  especially 
for  computer  database  systems.  An  English  database  interface  (called  “In¬ 
tellect  )  was  recently  announced  as  a  product  for  commercial  databases  by  a 
major  manufacturer  (Taylor  [1984]).  Montague  [1970]  formalized  a  theory 
of  syntax  and  semantics  for  natural  languages  that  has  been  the  basis  of 
much  research  on  Al-database  interfaces  (Clifford  [1983]).  Reiter  et  al. 
[1983]  summarize  the  present  state  of  the  field  of  artificial  intelligence  and 
databases,  and  the  lines  along  which  it  is  developing. 
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The  supplementary  material  for  Chapter  2,  above,  discusses  a  question 
some  philosophers  and  cognitive  psychologists  have  raised  about  the  basic 
premise  of  AI  semantic  information  processing,  namely  whether  intelligent 
“understanding”  can  really  be  equivalent  to  manipulating  symbols  and  data 
structures.  This,  of  course,  is  a  very  important  question  for  further 
thought. 

Again,  the  above  summary  can  cover  only  some  of  the  accomplish¬ 
ments  of  the  past  decade  in  this  research  area.  For  more  extensive  coverage, 
the  reader  is  referred  to  Barr  et  al  [1981]  and  the  various  IJCAI  Proceedings. 
Rosenschein  [1983]  provides  an  insightful  overview  of  the  current  state  and 
probable  future  directions  of  natural-language  processing. 

8.  PARALLEL  PROCESSING  AND 
EVOLUTIONARY  SYSTEMS 

AI  research  has  now  clearly  demonstrated  the  value  of  parallel  process¬ 
ing  and  evolutionary  systems  in  several  domains.  The  supplements  to  Chap¬ 
ters  2  through  7,  above,  mention  several  examples.  I  shall  briefly  recapitu¬ 
late  and  add  to  these  examples.  However,  the  reader  should  first  be  referred 
to  an  excellent  collection  of  papers  on  parallel  processing,  edited  by  Kuhn 
and  Padua  [1981]. 

Regarding  evolutionary  systems,  little  more  need  be  written,  except 
that  successes  using  this  approach  have  been  noted  in  supplements  to  Chap¬ 
ters  2  through  7.  Much  of  this  work  is  based  on  the  work  of  Holland,  de¬ 
scribed  in  Chapter  8. 

It  should  be  expressly  noted  that  Chapter  8’s  discussion  of  parallel  sys¬ 
tems  in  terms  of  cellular  automata  and  Turing  machines  is  very  theoretical. 
Actual  parallel-processing  systems  have  been  based  on  more  practical  archi¬ 
tectures,  often  via  components  and  technologies  developed  for  serial  proc¬ 
essors.  For  example,  an  important  approach  has  been  the  construction  of  ar¬ 
rays  of  computer  processing-units.  More  flexible  (but  sometimes  less 
efficieiit)  designs  have  avoided  the  array  structure  and  enabled  several  com¬ 
puters  to  share  common  memories  and  communications  buses.  Several  ex¬ 
amples  of  these  approaches  are  given  in  Kuhn  and  Padua  [1981]. 

In  addition,  a  “dataflow”  architecture  for  parallel-processing  systems 
has  been  developed  that  departs  from  the  conventional  von  Neumann  logic 
used  in  serial  computers.  The  essence  of  the  dataflow  concept  is  that  instruc¬ 
tions  execute  whenever  data  flows  to  them,  with  data  normally  flowing  to 
multiple  instructions  at  once.  The  von  Neumann  design  thinks  of  control 
flowing  serially  from  one  instruction  to  the  “next,”  Dennis  [1979]  gives  an 
overview  of  the  dataflow  architecture  he  developed. 
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Parallel  processing  has  found  applications  in  many  AI  domains.  Uhr 
[1980]  gives  cogent  reasons  why  computer  vision  systems  should  be  struc¬ 
tured  as  serial  layers  of  large-scale  parallel-processing  systems.  Duff  [1976] 
describes  CLIP4,  an  image  processor  consisting  of  a  large-scale  array  of  mi¬ 
croprocessors.  Kruse  [1980]  describes  PICAP,  a  parallel  bit-slice  architecture 
for  image  analysis.  Fennell  and  Lesser  [1977]  discuss  the  use  of  parallelism 
in  the  HEARSAY-II  speech-understanding  system,  Fahlman,  Hinton,  and 
Sejnowski  [1983]  discuss  the  use  of  parallel  processing  for  several  AI  pattern- 
recognition  problems. 

As  Uhr  suggests,  it  now  seems  clear  that  large-scale  parallel  systems  will 
ultimately  achieve  enormous  computation  rates.  The  design  of  NASA’s 
“massively  parallel  processor”  array  of  16,384  microprocessors  predicted 
over  6  billion  additions  per  second  (Batcher  [1980]).  Jackson  [1979]  envi¬ 
sioned  the  creation  of  “very  large-scale  parallel”  (VLSP)  computer  systems, 
which  would  combine  100,000  or  more  microprocessors  in  a  single  system, 
yielding  up  to  a  trillion  operations  per  second,  Stolfo  and  Shaw  [1982]  de¬ 
scribe  a  tree-structure  design  for  up  to  100,000  microprocessors,  specifically 
oriented  to  the  parallel  execution  of  AI  production  systems. 

The  potential  applications  of  VLSP  systems  could  be  quite  profound, 
for  artificial  intelligence  as  well  as  for  other  applications  of  computers.  To 
appreciate  this,  consider  that  100,000  microprocessors  could  total  100  bil¬ 
lion  transistors,  while  the  human  brain  has  about  12  billion  neurons.  We 
may  imagine  the  most  natural  applications  of  such  systems  by  looking  at 
things  we  now  consider  impossible  and  asking  if  they  might  be  done  by 
100,000  computers  working  in  concert.  This  is  left  as  an  entertaining  exer¬ 
cise  for  the  reader. 


9.  THE  HARVEST  OF  ARTIFICIAL 
INTELLIGENCE 

Chapter  9  discusses  the  general  applications  of  artificial  intelligence, 
concentrating  on  robotics  and  on  possible  future  consequences  of  AI  sys¬ 
tems.  In  many  ways  this  chapter  remains  extremely  relevant  to  present  re¬ 
search,  for  while  major  progress  has  been  made  in  the  development  of 
robotics,  major  questions  remain  regarding  the  future  of  AI. 

Indeed,  the  progress  in  robotics  has  intensified  our  appreciation  of  AI’s 
possible  consequences.  Public  concern  is  growing  about  increases  in  unem¬ 
ployment  caused  by  automation.  Japan  and  other  nations  are  developing 
factories  run  almost  entirely  by  robots.  And  Nilsson  [1983]  reminds  us  that 
AI  also  has  the  potential  to  perform  many  white-collar  jobs.  Coiffet  and 
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Richard  [1983]  have  provided  several  volumes  on  robotics  alone,  while 
Ayres  et  al.  [1983]  have  studied  the  applications  and  social  implications  of 
robotics.  IJCAI8  contains  other  recent  papers  on  robotics. 

We  should  not  be  too  sanguine  that  AI  will  have  only  positive  conse¬ 
quences.  Rather,  we  should  carefully  note  three  trends  over  the  past  decade: 
the  price  of  computer  hardware  has  fallen  steadily  and  dramatically;  the 
power  of  computer  hardware  has  grown  just  as  steadily  and  dramatically;  AI 
research  has  made  steady  and  dramatic  progress  toward  the  goal  of  general- 
purpose  AI  systems  that  could  ultimately  program  themselves,  with  little 
need  for  human  programmers.  We  may  expect  all  of  these  trends  to  contin¬ 
ue,  and  it  is  difficult  to  be  sure  of  their  rates  of  change  and  technical  limits. 

If  researchers  are  largely  successful  in  emulating  human  intelligence 
with  computers,  and  if  the  hardware-cost  and  performance  trends  continue 
for  a  sufficiently  long  time,  then  it  is  conceivable  that  AI  systems  will  com¬ 
pete  against  the  human  work  force  throughout  our  economy,  and  for  jobs  of 
all  types  and  levels,  not  just  those  on  assembly  lines. 

This  would  not  happen  overnight,  if  it  happens  at  all.  But  it  might  hap¬ 
pen  more  quickly  than  we  expect.  For  example,  current  AI  systems  place  us 
on  the  brink  of  automating  a  basic  secretarial  task,  taking  dictation.  With 
other  jobs  it  may  take  decades  or  longer  before  machines  compete  for  them. 

It  may  be  that  economies  ultimately  provide  only  a  finite  number  of 
gainful  tasks  and  that  jobs  lost  to  automation  are  not  necessarily  replaced  by 
jobs  elsewhere.  If  AI  systems  do  cause  permanent  unemployment,  then  we 
should  consider  ways  to  insure  that  AI  will  support  those  it  removes  from 
work.  Duchin  [1983]  suggests  possible  mechanisms  for  this,  such  as  a  “nega¬ 
tive  income  tax.” 

If  we  can  develop  such  mechanisms,  then  AI  may  lead  to  a  very  positive 
future,  with  more  leisure  time  and  a  higher  standard  of  living  for  the  general 
public  (Boden  [1983]).  If  so,  then  artificial  intelligence  will  become  one  of 
our  most  humanistic  sciences. 
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DOROTHY  WOUND  UP  NUMBER  ONE 


DIRECTIONS  FOR  USING: 

For  THINKING:— Wind  the  Clock-work  Man  under  his 
left  arm,  (marked  No.  1.) 

For  SPEAKING:— Wind  the  Clock-work  Man  under  his 
ri^t  arm,  (marked  No.  2.) 

For  WALKING  and  ACTION:— Wind  Clock-work  in  the 
middle  of  his  back,  (marked  No.  3.) 

N.B.-This  Mechanism  is  guaranteed  to  work  perfectly  for  a  thousand  years. 


Smith  and  Tinker’s  mechanicai  man  from  Frank  Baum,  Ozma  of  Oz. 

(with  permission) 
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rNTRODUCTION 

“Artificial  intelligence”  is  the  ability  of  machines  to  do  things 
that  people  would  say  require  intelligence.  Artificial  intelligence  (ai) 
research  is  an  attempt  to  discover  and  describe  aspects  of  human  intel¬ 
ligence  that  can  be  simulated  by  machines.  For  example,  at  present 
there  are  machines  that  can  do  the  following  things : 

1.  Play  games  of  strategy  (e.g.,  Chess,  Checkers,  Poker)  and 
(in  Checkers)  learn  to  play  better  than  people. 

2.  Learn  to  recognize  visual  or  auditory  patterns. 

3.  Find  proofs  for  mathematical  theorems. 

4.  Solve  certain,  well-formulated  kinds  of  problems. 

5.  Process  information  expressed  in  human  languages. 

The  extent  to  which  machines  (usually  computers)  can  do  these 
things  independently  of  people  is  still  limited;  machines  currently  exhibit 
in  their  behavior  only  rudimentary  levels  of  intelligence.  Even  so,  the 
possibility  exists  that  machines  can  be  made  to  show  behavior  indicative 
of  intelligence,  comparable  or  even  superior  to  that  of  humans. 

Alternatively,  ai  research  may  be  viewed  as  an  attempt  to  develop 
a  mathematical  theory  to  describe  the  abilities  and  actions  of  things 
(natural  or  man-made)  exhibiting  “intelligent”  behavior,  and  serve  as  a 
calculus  for  the  design  of  intelligent  machines.  As  yet  there  is  no  mathe¬ 
matical  theory  of  intelligence,”  and  researchers  dispute  whether  there 
ever  will  be. 

This  book  serves  as  an  introduction  to  research  on  machines  that 
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display  intelligent  behavior  (note  l—l )  ,^  Such  machines  sometimes 
will  be  called  “artificial  intelligences,”  “intelligent  machines,”  or  “me¬ 
chanical  intelligences.” 

The  inclination  in  this  book  is  toward  the  first  viewpoint  of  ai  re¬ 
search,  without  forsaking  the  second.  Since  ai  research  is  still  in  its 
infancy,  it  is  therefore  prudent  to  withhold  estimation  of  its  future.  It  is 
best  to  begin  with  a  summation  of  present  knowledge,  considering  such 
questions  as : 

1.  What  is  known  .about  natural  intelligence? 

2.  When  can  we  justifiably  call  a  machine  intelligent? 

3.  How  and  to  what  extent  do  machines  currently  simulate  intel¬ 
ligence  or  display  intelligent  behavior? 

4.  How  might  machines  eventually  simulate  intelligence? 

5.  How  can  machines  and  their  behavior  be  described  mathe¬ 
matically? 

6.  What  uses  could  be  made  of  intelligent  machines? 

Each  of  these  questions  will  be  explored  in  some  detail  in  this 
book.  The  first  and  second  questions  are  covered  in  this  chapter.  It  is 
hoped  that  the  six  questions  are  covered  individually  in  enough  detail 
so  that  the  reader  will  be  guided  to  broader  study  if  he  is  so  inclined. 
For  parts  of  this  book,  some  knowledge  of  mathematics  (especially  sets, 
functions,  and  logic)  is  presupposed,  though  much  of  the  book  is  under¬ 
standable  without  it. 


TURfNG’S  TEST 

A  basic  goal  of  ai  research  is  to  construct  a  machine  that  exhibits 
the  behavior  associated  with  human  intelligence,  that  is,  comparable  to 
the  intelligence  of  a  human  being  (note  1—2).  It  is  not  required  that  the 
machine  use  the  same  underlying  mechanisms  (whatever  they  are)  that 
are  used  in  human  cognition  (note  1-3),  nor  is  it  required  that  the 
machine  go  through  stages  of  development  or  learning  such  as  those 
through  which  people  progress. 

The  classic  experiment  proposed  for  determining  whether  a  machine 
possesses  intelligence  on  a  human  level  is  known  as  Turing^ s  test  (after 
A.  M.  Turing,  who  pioneered  research  in  computer  logic,  undecidability 

1  The  notes  at  the  ends  of  chapters  are  for  the  benefit  of  the  careful  reader 
and  are  intended  to  clarify  questions  that  may  arise  in  the  text.  ’ 
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theory,  and  artificial  intelUgence) .  This  experiment  has  yet  to  be  per¬ 
formed  seriously,  since  no  machine  yet  displays  enough  intelligent 
behavior  to  be  able  to  do  well  in  the  test.  Still,  Turing’s  test  is  the  basic 
paradigm  for  much  successful  work  and  for  many  experiments  in 
machine  intelligence,  from  the  Samuel’s  Checkers  Player  to  “semmtic- 
information  processing”  programs  such  as  Colby’s  parry  or  Raphaels 
SIR  (see  Chapters  4  and  7). 


Figure  1-1.  A  diagram  of  Turing’s  test. 

Basically,  Turing’s  test  consists  of  presenting  a  human  being,  A, 
with  a  typewriter-like  or  xv-like  terminal,  which  he  can  use  to  con¬ 
verse  with  two  unknown  (to  him)  sources,  B  and  C  (see  Fig.  1-^.  The 
interrogator  A  is  told  that  one  terminal  is  controlled  by  a  machine  and 
that  the  other  terminal  is  controlled  by  a  human  being  whom  A  has 
never  met.  A  is  to  guess  which  of  B  and  C  is  the  machine  and  which  is 
the  person.  If  A  cannot  distinguish  one  from  the  other  with  significantly 
better  than  50%  accuracy,  and  if  this  result  continues  to  hold  no  matter 
what  people  are  involved  in  the  experiment,  the  machine  is  said  to 
iimw/afe  human  intelligence  (note  1-4). 
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Some  comments  on  Turing’s  test  are  in  order.  First,  the  nature 
of  Turing  s  test  is  such  that  it  does  not  permit  the  interrogator  A  to  ob¬ 
serve  the  physical  natures  of  B  and  C;  rather,  it  permits  him  only  to 
observe  their  “intellectual  behavior,”  that  is,  their  abhity  to  communi¬ 
cate  with  formal  symbols  and  to  “think  abstractly.”  So,  while  the  test 
does  not  enable  A  to  be  prejudiced  by  the  physical  nature  of  either 
B  or  C,  neither  does  it  give  a  way  to  compare  those  aspects  of  an 
entity’s  behavior  that  reflect  its  ability  to  act  nonabstractly  in  the  real 
world  that  IS,  to  be  intelligent  in  its  performance  of  concrete  opera¬ 
tions  on  objects.  Can  the  machine,  for  example,  fry  an  egg  or  clean 

Second,  one  possible  achievement  of  ai  research  would  be  to  pro¬ 
duce  a  complete  description  of  a  machine  that  can  successfully  pass 
Turing  s  test,  or  to  find  a  proof  that  no  machine  can  pass  it.  The  com¬ 
plete  description  must  be  of  a  machine  that  can  actually  be  constructed. 
A  proof  that  there  is  no  such  constructible  machine  (it  might  say,  e.g.. 
The  number  of  parts  in  such  a  machine  must  be  greater  than  the 
number  of  electrons  in  the  universe.”)  is  consequently  to  be  regarded 
as  a  proof  of  the  “no  machine”  alternative. 

Third,  it  may  be  that  more  than  one  type  of  machine  can  pass 
Turing  s  test.  In  this  case,  ai  research  has  a  secondary  problem  of 
creating  a  general  description  of  all  machines  that  will  successfullv  pass 
Turing’s  test.  ^ 

Fourth,  if  a  machine  passes  Turing’s  test,  it  means  in  effect  that 
there  is  at  least  one  machine  that  can  learn  to  solve  problems  as  well  as 
a  human  being.  This  would  lead  to  asking  if  a  constructible  machine  can 
be  described  which  would  be  capable  of  learning  to  solve  not  only  those 
problems  that  people  can  usuaUy  solve,  but  also  those  that  people  create 
but  can  only  rarely  solve.  That  is,  is  it  possible  to  build  mechanical 
intelligences  that  are  superior  to  human  intelligence? 

It  is  not  yet  possible  to  give  a  definite  answer  to  any  of  these 
questions.  Some  evidence  exists  that  ai  research  may  eventually  attain 
at  least  the  goal  of  a  machine  that  passes  Turing’s  test. 

It  is  clear  that  the  intellectual  capabilities  of  a  human  being  are 
directly  related  to  the  functioning  of  his  brain,  which  appears  to  be  a 
finite  structure  of  cells.  Moreover,  people  have  succeeded  in  construct¬ 
ing  machines  that  can  “learn”  to  produce  solutions  to  certain  specific 
intellectual  problems,  which  are  superior  to  the  solutions  people  can 
pduce.  The  most  notable  example  is  Samuel’s  Checkers  Player^hich 
has  learned  to  play  a  better  game  of  Checkers  than  its  designer,  and 
which  currently  plays  at  a  championship  level  (see  Chapter  4). 
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NATURAL  INTELLIGENCE 

The  definition  of  “intelligence”' in  Webster’s  Third  International 
Dictionary  (1966)  reads: 


Un>tel*li>seiice  \dn»'telajan(t)s\  n  -s  ojten  attrib  [ME,  fr. 
fr.  OF,  frfL  intelligentia,  fr.  intelligent-, 

+  -ia  -y  —  more  at  intelligent]  1  a  (1)  :  the  ^ulty  ot 
understanding  ;  capacity  to  know  or  apprehend  .  iNTELpcr, 
REASON  which  emerged  during  the 

matter  as  the  highest  forin  yet  achieved  -Hermann  Reith) 
<conceived  of  history  as  the  expression  of  a 
Christian  Science  \  the  basic  eternal  quality  of  divine  Mind 
b  :  the  available  ability  as  measured  by  ^ 

other  social  criteria  to  use  one  s  existing  knowledge  to  meet 
new  situations  and  to  solve  new  problems,  to  learn,  to  foresee 
Sems,  to  u?e  symbols  or  relationships,  to  create  new  re- 
Tationships,  to  think  abstractly  :  ability  to  3**^1 

vironment,  to  deal  with  it  symbolically,  to  deal  with  it 
tively,  to  adjust  to  it,  to  work  toward  a  8oaJ  •  ® 

onK  alertness,  awareness,  or  acuity  ;  ability  to  use  with 
awareness  the  mechanism  of  reasoning  whether  conceived  as  a 
SnifiedlntelteS  factor  er  as  the  aggregate  of  ■"“yntelte- 
tnal  factors  or  abilities,  as  intuitive  or  as  analytic,  as  organ 

i"m'c%io?og?cat  physic  ?Te^“Sc.?v 

origin  and  nature  C  ;  mental  acuteness  :  sagacity,  shrewi^ 
NESS  <did  all  he  was  asked  to  do  with  ~  and  great  good 
humor)  2  a  :  an  intelligent  being;  S 
•  AwcBi  f'hierarchies  of  angelic  — S.F. Mason)  b  «  a  peij 

•^fnfSrmVdon^commlrn  ^°^^Gr2iey> 

wS  is  laid  upon  -  than  on  editorials  -Horace  Grwle:^ 
<the  joyful  that  there  is  hope  —Georgina  Grahame)  <from 
the  engine-room  voice  tube  came  ^  of  more  importance 
M  S  Boylan)  (2)  :  interchange  of  information  : 

TioN  ^accused  of  maintaining  with  the  enemy)  (3)  obs  .  a  . 
Die?e  bf  informat^  used  in  pi.  (4)  archaic  :  cojnmon 

understanding  or  mutual  relations  :  acquaintance,  inter¬ 
course  (5)  :  evaluated  information  concerning  an  euemy  or 
SoSe  enemy  or  a  possible  theater  of  operations  and  the 
eonclusions  drawn  therefrom;  also  = 

oersons  engagfed  in  obtaining  such  information  .  s^RET 
SERVICE  <investi gated  me  and  told  me  I  was  qualified  for  Navy 
Murphy)  <an  ~  bureau)  <avaiVable  to  American  and 
allied  ~  organizations  — L.W.Doob)  syn  see  mind 
^intelligence  vf  -ed/-ing/-s  obs  :  to  bring  tidings  of  (so 
thing)  or  to  (someone) 


{Reprinted  by  permission  from  Webster’s  Third  Internationai  Dictionary 
©  1971  by  G.  &  C.  Merriam  Co.,  Pubiishers  of  the  Merriam-Webster 

Dictionaries.) 


To  summarize  the  definition  in  one  phrase,  one  might  say  that 
intelligence  is  the  ability  “to  act  rightly  in  a  given  situation.”  Although 
one  could  imagine  an  entity  that  always  behaves  “rightly,”  without  mak¬ 
ing  any  errors,  Ai  research  is  more  concerned  with  the  concept  of  partial 
success,  with  building  machines  that  can  make  mistakes,  but  which  can 
also  change  their  behavior  with  time  and  perhaps  stop  making  mistakes. 
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Intuitively,  ai  research  is  concerned  with  building  machines  that  can 
“adjust”  or  “adapt”  to  certain  environments,  and  which  in  effect  learn 
to  solve  problems  within  these  environments.  This  corresponds  with  the 
ordinary  conception  of  human  intelligence — that  it  is  limited,  but  that 
it  can  learn  and  thereby  improve  its  performance  of  certain  tasks  with 
time. 

Surprisingly  little  is  known  concerning  the  limitations  of  human 
intelligence.  No  one  has  made  any  complete  survey  of  the  problems  that 
can  be  solved  by  human  beings.  The  ability  to  solve  certain  types  of 
problems  has  been  studied  and  made  the  basis  of  “intelligence”  tests, 
but  the  generality  and  validity  of  these  tests  is  disputable.  Isaac  Newton' 
for  example,  might  have  scored  low  on  such  tests  when  he  was  an 
adolescent;  yet  he  is  estimated  by  some  to  have  had  an  intelligence 
quotient  (iq)  near  200.  One  of  the  shortcomings  of  these  tests  is  that 
they  predict  little  concerning  the  development  of  a  person’s  intelligence, 
especially  what  problems  he  could  learn  to  solve. 

Evidence  concerning  human  intelligence  can  be  obtained  from  four 
major  sources :  history,  introspection,  the  social  sciences,  and  the  bio¬ 
logical  sciences.  Included  in  the  social  sciences  are  psychology,  anthro¬ 
pology,  sociology,  economics,  political  science;  among  the  biological  sci¬ 
ences  are  neurobiology,  biochemistry,  biology.  “Introspective”  sciences 
might  include  mathematical  logic,  systems  analysis,  and  music  theory. 

Evidence  from  History 

A  discourse  on  the  full  history  of  human  intelligence  is  certainly 
beyond  the  bounds  of  this  book.  Some  allusions  to  this  history  can  be 
woven  in  while  presenting  evidence  from  other  sources. 


Evidence  from  Introspection 

Introspection  has  yielded  a  wealth  of  seemingly  ambiguous  and 
contradictory  views  of  intelligence.  One  important  introspective  work 
familiar  in  the  Western  world  is  Descartes’  Discourse  on  Method,  This 
work  purports  to  be  ultimately  based  only  on  the  notion  of  thought:  “I 
think  therefore  I  exist.”  So  far  as  the  work  concerns  intelligence, 
Descartes  made  a  clear  distinction  between  animals  and  human  beings. 
Animals,  he  believed,  are  not  much  different  from  machines;  anything 
an  animal  can  do  he  could  imagine  being  done  by  a  sufficiently  com¬ 
plicated  machine.  People,  however,  are  different  from'  either  animals  or 
machines,  since  people  have  an  ability  to  “conimunicate”  with  each 
other,  to  use  signs,  sentences,  and  languages  that  are  clearly  not  com¬ 
pletely  the  result  of  instinct  or  construction.  Descartes  regarded  the 
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ability  to  use  languages  as  the  most  significant  indication  that  something 
has  human  intelligence:  .  .  for  the  word  is  the  sole  sign  and  the  only 

certain  mark  of  the  presence  of  thought  hidden  and  wrapped  up  in  the 
body.  .  . 

Descartes  was  partially  correct  in  his  observation  that  animals  can¬ 
not  communicate  in  the  same  fashion  as  people.  There  is  recent  evidence 
that  dolphins  have  some  sort  of  language,  but  the  nature  of  their  lan¬ 
guage  is  still  not  understood  (Lilly,  1967).  Chapter  7  explores  the 
relationship  of  intelligence  and  language. 

Another  introspective  way  of  looking  at  the  mind  is  that  provided 
by  the  “rooms  of  consciousness”  concept.  In  this  system  a  human  mind 
is  viewed  as  being  able  to  inhabit  and  move  among  a  set  of  rooms,  which 
are  distinguished  from  each  other  by  their  lighting — Socrates  metaphor 
of  the  Cave  in  Plato’s  Republic  a  good  example.  Various  rooms  can 
be  associated  with  different  levels  and  abilities  of  intelligence;  this 
introspective  metaphor  has  been  developed  in  Eastern  cultures  by  Buddha 
and  Lao  Tse,  as  well  as  in  the  Western  world  by  other  philosophers. 
Also,  the  significance  of  “light”  in  the  metaphor  is  typical.^  Other 
variations  on  the  metaphor  speak  of  some  rooms  as  possessing  illusions 
and  dreams. 

One  viewpoint  of  intelligence,  which  is  often  developed  by  intro¬ 
spection,  is  that  there  is  a  distinction  between  scientific  (intellectual) 
learning  and  spiritual  learning  abilities.  Scientific  learning  is  said  to  rely 
on  certain  rules  for  the  belief,  derivation,  refutation,  and  proof  of  proposi¬ 
tions  about  the  universe.  Presumably,  science  requires  a  language  for 
describing  events  and  the  meanings  of  measurements,  and  is  dependent 
on  the  existence  of  invariant,  reproducible  things  in  the  universe. 
“Spiritual”  learning,  on  the  other  hand,  does  not  require  words  or  lan¬ 
guage  and  may  evade  intellectual  reasoning  processes.  For  various 
people,  introspection  has  yielded,  for  example,  the  following  notions  of 
nonintellectual  learning : 

1.  Subconscious  learning,  in  which  knowledge  is  somehow  ob¬ 
tained  without  conscious  reasoning. 

2.  Emotional  learning,  in  which  knowledge  is  perceived  as  an 
emotion,  without  reasoning. 

3.  Inspired  learning,  in  which  knowledge  is  given  to  one  in¬ 
stantaneously,  without  reasoning,  perhaps  by  a  deity. 

2  From  a  letter  of  1647  to  Henry  More,  translated  by  L.  C.  Rosenfield  in 
the  Annals  of  Science,  Vol.  1,  No.  1  (1936).  Descartes  did  not  claim  that  animals 
are  machines;  he  said  that  they  do  possess  “life”  and  “feeling.” 

3  From  a  physical  standpoint  the  relation  of  light  to  intelligence  seems  to  be 
simply  that  light  waves  (electromagnetic  radiation)  are  the  fastest  means  for 
transmitting  information. 
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4.  Paladoxical  learning,  in  which  one  is  able  to  perceive  knowl¬ 
edge  that  is  self-contradictory,  regardless  of  how  it  is  ex¬ 
pressed  in  words,  and  therefore  beyond  logical  or  scientific 
learning. 

Again,  this  introspective  viewpoint  has  been  developed  both  in 
Eastern  and  Western  cultures.  The  reader  who  wishes  to  study  the 
subject  deeply  may  wish  to  read  Dostoevsky,  Freud,  Jung,  and  Lao 
Tse.  Various  people  have,  of  course,  argued  that  emotional  and  sub¬ 
conscious  learning  can  be  scientifically  explained. 

The  viewpoint  that  intelligence  in  certain  forms  cannot  be  ex¬ 
plained  logically  or  scientifically  is  relevant  to  artificial  intelligence  re¬ 
search.  If  this  viewpoint  is  correct,  then  presumably  there  are  some 
types  of  knowledge  that  machines  cannot  be  said  to  possess  and  there 
are  some  ways  of  gaining  knowledge  they  cannot  use.  Chapter  2  dis¬ 
cusses  the  nature  of  machines  and  of  scientific  and  mathematical  de¬ 
scriptions  of  things  more  thoroughly.  For  now,  the  viewpoint  expressed 
there  is  that  while  it  can  be  argued  mathematically  that  there  are  entities 
which  cannot  be  completely  described  mathematically,  there  is  probably 
no  way  of  proving  in  the  real  world  that  something  is  beyond  the  power 
of  science  to  explain.  All  that  can  be  proved  is  that  science  has  so  far 
not  explained  it. 

Thus,  no  comment  is  made  here  as  to  the  existence  or  nature  of 
spiritual  learning:  What  is  important  is  whether  there  are  some  forms  of 
learning  and  intelligence  that  can  be  exhibited  by  machines.  Whether 
“some”  means  “all”  is,  scientifically  speaking,  an  open  question. 

Perhaps  not  surprisingly,  introspection  as  a  technique  for  gaining 
knowledge  about  intelligence  often  seems  to  yield  only  “circular”  ques¬ 
tions  (Can  one  learn  how  to  learn?  If  one  knows  something,  does  one 
know  that  he  knows  it? ) .  Even  so,  introspection  is  probably  the  source 
most  commonly  used  in  artificial  intelligence  research  for  information 
about  specific  problem-solving  abilities  of  human  intelligence.  Most  re¬ 
searchers  use  their  own  experience  at  having  solved  problems  whenever 
they  are  attempting  to  make  a  machine  solve  one;  usually  if  you  are 
going  to  try  to  design  a  machine  that  does  something,  it  is  a  good  idea 
to  try  doing  it  yourself  first  and  see  what  happens. 

This  does  not  mean  that  your  machine  will  wind  up  imitating  the 
human  approach  to  the  problem.  Actually,  machines  will  often  work 
more  efficiently  on  certain  problems  when  they  operate  in  ways  that 
may  seem  quite  foreign  to  human  reasoning  patterns.  AI  research  is  con¬ 
cerned  with  finding  machines  that  simulate  iht  abilities  of  human  intel¬ 
ligence — that  is,  with  finding  machines  that  reproduce  the  outward 
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abilities  of  human  intelligence,  though  not  necessarily  the  inner  means 
people  use  to  achieve  these  abilities. 

Probably  the  major  advantage  to  using  introspection  in  artificial 
intelligence  research  is  simply  that  it  can  give  the  researcher  an  idea  of 
the  information  relevant  to  the  problem  he  is  trying  to  make  a  machine 
solve.  One  of  the  innate  abilities  of  intelligent  creatures  seems  to  be 
an  ability  to  discard  large  amounts  of  information,  and  focus  only  on 
that  which  is  “relevant.” 


Evidence  from  the  Social  Sciences 

The  evidence  from  the  social  sciences  concerning  human  intel¬ 
ligence  is  scanty.  Only  a  few  general  things  are  known  with  certainty. 

1 .  Human  intelligence  is  a  species-wide  trait;  there  does  not  seem 
to  be  any  clear  distinction  between  the  innate  learning  and  problem¬ 
solving  abilities  of  infants  belonging  to  the  various  races.  Thus,  a  normal 
child,  properly  raised,  can  learn  the  language  of  any  human  culture, 
regardless  of  the  language  spoken  by  his  biological  parents. 

2.  The  intelligence  of  an  individual  develops  with  time  and  is 
strongly  affected  by  the  nature  of  his  environment.  For  example,  identi¬ 
cal  twins  (who  have,  barring  mutations,  the  same  genetic  endowment) 
raised  in  different  environments  have  been  found  to  show  differences 
in  their  intelligence  quotients  as  great  as  24  points. 

3.  The  intelligence  of  an  individual  is  also  strongly  affected  by 
his  heredity.  Thus,  identical  twins  raised  in  approximately  the  same 
environment  tend  to  show  less  difference  in  their  iqs  than  do  other  types 
of  siblings. 

4.  The  intelligence  of  an  individual  may  vary  with  respect  to  dH- 
ttTQni  problem  domains — we  express  this  by  saying  that  different  in¬ 
dividuals  may  have  different  “aptitudes.” 

Experiments  performed  by  Piaget  (1946  et  seq.)  and  others  have 
shown  that  the  intelligence  of  a  child  develops  in  stages.  Precisely  why 
this  is  so  is  unknown,  but  it  seems  clear  that  these  stages  do  exist  and 
that  the  child  must  accumulate  sufficient  experience  operating  within 
each  stage  before  he  can  progress  completely  to  the  next.  Piaget  dis¬ 
tinguished  four  stages:  sensori-motor,  preoperational,  concrete  opera¬ 
tional,  and  formal  operations. 

Sensori-motor  Stage,  This  stage  lasts  for  the  first  year  and  a  half 
to  two  years  of  the  individual’s  life.  During  this  stage  he  makes  the 
transition  from  using  only  his  instinctive  abilities  to  developing  an 
elementary  ability  to  reason  causally  and  use  signals.  By  the  eighth 
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week  of  an  infant’s  life  he  is  able  to  discriminate  visually  between  dif¬ 
ferent  depths  and  orientations  of  objects  and  to  visually  perceive  objects 
as  having  constant  size  and  shape,  even  when  they  are  receding  and 
rotating.  After  about  the  eighth  month  a  baby  can  understand  that  a 
rattle  will  shake  only  when  he  pulls  on  a  string  attached  to  the  rattle. 
Also,  after  the  eighth  month,  an  infant  develops  vocal  and  bodily 
gestures  that  refer  to  events  and  objects  in  his  environment:  he  will, 
for  example,  develop  facial  expressions  and  learn  to  make  sounds  that 
represent  things  he  desires  or  wishes  to  avoid. 

Preoperational  or  Symbolic-Operational  Stage.  This  stage  lasts 
roughly  from  the  second  to  the  seventh  year  of  the  child’s  life.  During 
this  stage  the  child  learns  the  basic  vocabulary  of  the  language  of  his 
culture,  and  develops  an  ability  to  describe  events  in  sentences  (prior  to 
this  stage,  he  describes  events  with  a  single  word).  Also  during  this  stage 
the  child  conducts  extensive  experiments  in  his  environment  and  learns 
many  different  causal  relationships.  Most  of  his  experimenting  is,  how¬ 
ever,  intuitively  ^ided,  as  is  also  the  way  he  describes  things.  If  a 
child  in  this  stage  is  asked  what  a  jar  is,  he  might  say,  “There’s  lemonade 
in  it.”  Although  he  can  distinguish  between  “all”  and  “some,”  his 
ability  to  express  the  distinction  is  limited:  If  he  is  shown  a  bouquet 
of  flowers,  only  some  of  which  are  roses,  and  asked  whether  there  are 
more  roses  or  more  flowers,  he  will  typically  respond  that  there  are 
more  roses.  Toward  the  end  of  this  stage  a  child  can  be  taught  to  read 
and  write. 

Concrete-Operational  Stage.  From  age  seven  to  age  eleven  the  child 
is  able  to  make  very  significant  generalizations  of  his  notions  of  causality. 
In  particular,  he  is  able  to  recognize  the  concepts  of  invariance,  reversi¬ 
bility,  and  conservation.  Prior  to  this  stage,  a  child,  when  shown  two 
“congruent”  glasses  filled  with  the  same  amount  of  water,  will  say  that 
they  have  the  same  amount  of  water;  but  if  the  water  from  one  glass  is 
then  poured  into  a  taller,  thinner  glass,  he  will  say  that  the  taller,  thinner 
glass  has  more  water.  Only  when  he  reaches  the  concrete-operational 
stage  does  he  (evidently)  realize  that  both  glasses  have  the  same 
amount  of  water,  regardless  of  their  shape.  Also  in  this  stage  of  develop¬ 
ment,  when  presented  with  a  bouquet  of  flowers  only  some  of  which 
are  roses,  a  child  will  say  there  are  more  flowers  than  roses. 

Formal  Operations  Stage.  From  the  age  of  eleven  upward,  the 
individual  becomes  able  to  operate  logically  with  the  form  of  an  argu¬ 
ment,  independently  of  its  meaning;  that  is,  he  recognizes  factors  in¬ 
volved  in  an  event  and  plans  experiments  that  will  give  him  knowledge 
about  it.  It  is  in  this  stage  that  the  individual  appears  to  develop  a 
proficiency  at  reasoning  abstractly  with  words. 
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Some  caveats  concerning  these  four  stages  should  be  stated.  First, 
very  little  is  known  concerning  the  emotional  and  subconscious  develop¬ 
ment  of  a  person’s  intelligence.  Second,  there  are  exceptions  to  the  rate 
at  which  children  go  through  these  stages:  Mozart,  for  instance,  could 
play  the  piano  and  compose  proficiently  at  the  age  of  five.  Gauss  taught 
himself  to  read,  could  do  complicated  arithmetic  when  he  was  three,  and 
had  certainly  reached  the  formal  operations  stage  by  the  time  he  was 
eight  or  nine. 

Another  set  of  basic  facts  about  intelligence  and  learning  are 
those  developed  by  behavioristic  psychology.  Behaviorist  psychologists 
have  attempted  to  understand  intelligent  behavior  by  treating  their 
subjects  as  “black  boxes,”  presenting  them  with  certain  standardized 
situations  and  then  recording  their  reactions.  They  have  been  able  to 
demonstrate  certain  phenomena  repeatedly  in  several  different  species, 
including  man. 

The  results  best  known  involve  learning  experiments  in  the  form  of 
the  traditional  “classical  conditioning”  and  “instrumental,  or  operant 
conditioning.”  In  both  cases  a  conditioned  stimulus  (cs)  that  has  neu¬ 
tral  intrinsic  value  to  the  animal  (e.g.,  a  light  flash)  is  temporally  paired 
with  an  unconditioned  stimulus  (ucs)  that  has  a  preexisting  reward  or 
pain  value.  In  classical  conditioning  the  ucs  is  followed  by  the  cs 
despite  the  animal’s  response  (e.g.,  Pavlov’s  induction  of  salivation  in 
dogs  when  a  bell  was  rung).  In  instrumental  conditioning  the  subject’s 
response  to  the  cs  determines  whether  he  receives  the  ucs.  Findings 
concerning  learning  in  these  situations  include  (Thompson,  1967) : 

1.  Up  to  a  point,  the  stronger  the  ucs,  the  more  rapid  is  the  con- 
ditioning. 

2.  The  most  effective  time  relations  for  classical  conditioning  ap¬ 
pear  to  be  when  the  cs  begins  about  a  half-second  prior  to  the 
ucs.  As  the  time  between  cs  and  ucs  increases,  the  efficiency  of 
the  conditioning  decreases. 

3.  The  greater  the  time  between  trials,  the  fewer  the  trials  required 
for  conditioning. 

4.  If  the  cs  is  repeatedly  given  without  the  ucs  after  conditioning 
has  occurred,  the  conditioned  response  will  extinguish,  or  die  out. 

5.  Following  extinction,  the  conditioned  response  to  the  cs  will 
exhibit  spontaneous  recovery  in  the  absence  of  ucs  presentations. 

6.  If  reinforcement  is  given  only  in  some  of  the  trials,  conditioning 
occurs  more  slowly,  but  is  more  resistant  to  extinction. 

7.  If  an  additional  neutral  stimulus  is  temporally  paired  with  the 
cs  after  conditioning,  it  will  subsequently  elicit  conditioned  re¬ 
sponses. 
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8.  If  conditioning  and  extinction  series  are  repeated,  both  processes 
will  occur  progressively  more  rapidly. 

Behaviorist  psychologists  postulate  that  these  forms  of  condition¬ 
ing  underlie  all  forms  of  intelligent  adaptation,  or  learning.  They  have 
had  difficulty,  however,  in  analyzing  the  development  of  relatively  com¬ 
plex  problem-solving  behavior  (such  as  that  described  by  Piaget’s 
findings).  As  yet,  there  is  no  very  detailed  explanation  for  the  develop¬ 
ment  and  abilities  of  human  intelligence  in  terms  of  classical  and 
operant  conditioning. 


Evidence  from  the  Biological  Sciences^ 


State  of  Knowledge.  If  a  really  detailed  explanation  for  the  in¬ 
dividual  human  intelligence  were  to  be  given,  it  might  well  require  a 
complete  description  of  the  human  brain.  Biologists  are  a  long  way 
from  anything  approaching  such  a  description.  This  section,  however, 
will  present  an  overview  of  current  knowledge  and  nescience,  since  for 
the  person  doing  active  research  in  artificial  intelligence  it  is  important 
to  have  such  a  summary. 

The  Neuron  and  the  Synapse.  The  human  brain  contains  ap¬ 
proximately  12  billion  nerve  cells,  or  neurons.  It  has  been  shown  that 
each  cell  has  from  5600  to  60,000  dendritic  connections  (incoming 
signal  carriers);  consequently,  each  must  have  equivalent  numbers,  on 
the  average,^  of  axonal  branches  (outgoing  signal  carriers)  contacting 
other  neural  cells  (Cragg,  1967).  Such  numbers  may  indicate  a  storage 
and  processing  capability  several,  orders  of  magnitude  greater  than  cur¬ 
rent  computers,  because  we  know  so  little  about  the  functions  that  can 
be  executed  by  neurons. 

The  neuron  is  qualitatively  quite  different  from  “on-off”  com¬ 
ponents  of  current  computers.  An  idealized  neuron  is  shown  in  Fig.  1—2. 

The  armlike  projections  from  the  cell  body,  or  “soma,”  are  called 
dendrites.  Axons  from  other  nerve  cells  contact  the  soma  and  dendrite 
proper  or  the  dendritic  spines  (small  projections  from  the  dendritic 
surface)  by  means  of  synapses  (see  Fig.  1—3).  It  is  believed  that  axons 
synapsing  with  the  dendritic  or  soma  surfaces  are  inhibitory  and  those 
synapsing  on  the  dendritic  spines  are  excitatory  to  the  neuron  receiving 
their  signals. 

Inpulses  transmitted  at  the  synapses  add  to  or  subtract  from  the 
magnitude  of  the  voltage  fluctuations  that  slowly  wax  and  wane  over 
the  membrane  of  the  soma.  The  electric  currents  are  the  result  of  a 


^  I  am  indebted  to  my  friend  and  colleague 
adapt  this  section  from  an  unpublished  paper. 
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change  in  the  potential  difference  between  the  inside  and  outside  of  the 
cell  body,  caused  by  a  disequilibrium  of  charged  ions  across  the  cell 
membrane  (see  below).  If  the  summation  of  the  additions  and  decre¬ 
ments  to  this  current  reaches  a  certain  value  (about  10  millivolts),  an 
impulse  is  fired  down  the  neuron’s  axon.  Most  neurophysiologists  be¬ 
lieve  that  the  impulse  is  initiated  at  the  axon  hillock  (the  interface  be¬ 
tween  the  soma  and  the  axon).  However,  there  is  recent  evidence  that  in 
certain  mollusk  cells  the  impulse  may  be  initiated  inside  the  cell,  and 
may  not  be  a  direct  consequence  of  the  soma’s  integrated  slow  waves 
(Pribram,  1971). 
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An  electric  impulse  is  propagated  down  the  axon  at  a  few  feet  per 
second;  this  propagation  is  based  on  a  nerve  membrane  potential  The 
nerve  membrane  is  a  barrier  composed  of  lipids  (e.g.,  fats),  proteins, 
and  sugars,  which  selectively  prevent  large  molecules  and  certain  ions 
from  entering  or  leaving  the  neuron.  It  selectively  screens  out  sodium 
ions,  and  is  freely  permeable  to  potassium  ions.  This  creates  a  cation 
excess  outside  the  membrane,  which  opposes  tendencies  of  the  potas¬ 
sium  ions  to  equilibrate  the  charge  or  equilibrate  the  potassium  con¬ 
centration  on  both  sides  of  the  membrane. 

Consequently,  not  enough  ions  move  inside  to  compensate 
for  the  large  number  of  Na^  ions  on  the  outside  of  the  cell,  and  this 
causes  a  potential  difference  across  the  membrane  of  about  —70  milli¬ 
volts.  Initiation  of  the  impulse  at  the  axon  hillock  consists  of  a  small 
10  millivolt  change  in  the  membrane  potential,  which  causes  the  break¬ 
down  of  the  Na^  barrier,  the  influx  of  Na^  ions,  and  the  efflux  of 
ions,  and  the  consequent  change  in  the  nerve  membrane  potential  to 
+40  millivolts.  Immediately  after  these  changes,  enzymes  embedded  in 
the  membrane  “pump”  the  Na"  out  of  the  cell  and  readjust  it  to  the 
resting  potential.  This  initiation  triggers  a  similar  breakdown  in  the 
adjacent  membrane,  and  so  the  electric  signal  is  carried  down  the  axon. 

The  amplitude  and  speed  of  the  impulse  are  functions  of  the  axon 
diameter,  whereas  the  frequency  is  a  result  of  the  soma’s  integration  of 
incoming  stimulations  and  the  consequent  “decisions”  to  fire  (Thomp¬ 
son,  1967,  pp.  129-163). 

Many  of  the  longer  axons  SLrcjnyelinated,  that  is,  they  possess  a 
sheath  of  fat  surrounding  them  which  greatly  speeds  conduction  and  in¬ 
sulates  the  axon  from  neighboring  electrical  activity.  After  multiple 
branchings,  the  axons  become  smaller  in  diameter  and  unmyelinated; 
when  they  reach  another  cell,  they  are  quite  small  and  the  current  is  of 
low  amplitude  and  going  more  slowly.  Here  it  is  possible  that  the 
electric  potentials  of  neighboring  axons  from  different  neurons  might 
interact,  either  potentiating  or  damping  local  electrical  activity. 

The  interface  between  the  axon  and  dendrite  of  the  contacting 
cells  is  the  synapse.  The  impulse  is  transmitted  across  the  “synaptic 
cleft’’  by  chemical  transmitters  such  as  acetylcholine,  norepinephrine 
and  dopamine,  seretonin,  and  certain  amino  acids.  Different  transmitters 
predominate  in  anatomically  and  functionally  different  portions  of  the 
brain  and  spinal  chord.  Acetylcholine,  norepinephrine,  and  depamine 
have  been  shown  to  be  packaged  in  very  small  vesicles  in  the  pre- 
synaptic  membrane.  On  being  activated  by  an  impulse,  these  vesicles 
extrude  the  transmitter  into  the  synaptic  cleft,  where  it  crosses  the  100 
angstrom  distance  to  combine  with  specific  receptors  on  the  post- 
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synaptic  membrane.  This  combination  effects  the  opening  of  ionic 
gates,  which  cause  either  an  increment  or  decrement  in  the  general 
activity  of  the  post-synaptic  neuron.  Excess  transmitter  is  either  de¬ 
stroyed  or  taken  up  again  by  the  presynaptic  bouton  to  prevent  flooding 
of  the  post-synaptic  receptors  and  allow  the  synapse  to  prepare  itself  for 
the  next  synaptic  transmission  (Thompson,  1967,  pp.  111-128  192- 
209;  Weiner,  1971). 

Until  recently  it  had  been  hypothesized  that  all  synapses  between 
a  neuron  and  its  follower  neurons  had  the  same  presynaptic  transmitter, 
were  functionally  the  same  (excitatory  or  inhibitory),  and  had  receptors 
that  opened  up  only  one  kind  of  ionic  gate.  However,  work  with 
Aplesia,  a  sea  slug  with  conveniently  large  neurons  and  a  simple  nervous 
system,  revealed  several  neurons  that  could  both  excite  and  inhibit 
their  “follower”  cells.  These  neurons  all  used  acetylcholine  as  their 
transmitters.  At  the  synapses  that  were  excitatory,  acetylcholine  com¬ 
bined  with  the  post-synaptic  receptors  to  open  Na*  ion  gates,  whereas 
at  the  inhibitory  synapses  acetylcholine  combined  with  the  receptors  to 
open  Cl  ion  gates.  One  of  these  multiaction  neurons  had  a  follower 
cell  that  had  both  kinds  of  receptors  in  the  post-synaptic  membrane. 
Here  the  rate  of  stimulation  determined  whether  the  excitatory  or  the 
inhibitory  ionic  gates  would  predominate.  Acetylcholine  stimulated  a 
third  type  of  receptor  to  open  up  K*  ionic  gates  that  caused  a  longer 
lasting  inhibition  than  the  chloride  gates  had  caused.  Such  work  has 
shown  that  neurons  with  a  single  type  of  transmitter  can  have  a  variety 
of  effects  on  their  follower  cells  because,  the  determination  of  the 
resultant  effects  of  neural  transmission  is  a  function  of  the  differences 
in  the  post-synaptic  receptors  and  the  ionic  gates  that  are  opened 
(Kandel,  1970;  Gardner  &  Kandel,  1972). 

Why,  then,  does  the  mammalian  CNS  have  so  many  different  trans¬ 
mitters?  It  has  been  shown  that  stimulation  by  cholinergic  neurons 
(these  that  use  acetylcholine  as  their  transmitters)  and  seretonergic 
neurons  (those  that  use  seretonin)  causes  certain  hormonal-like  changes 
in  the  follower  neurons.  Seretonergic  stimulation  causes  a  rise  in  c-AMP, 
a  mediator  common  to  many  hormones,  in  the  follower  cell.  Cholinergic 
stimulation  causes  a  rise  in  the  phosphitidal  inositol  of  the  follower  cell. 
Eric  Kandel  has  hypothesized  that  post-synaptic  membranes  may  have 
two  different  classes  of  receptors.  The  ionophoric  receptors  bind  with 
a  common  transmitter  to  open  different  ionic  gates,  thereby  affecting  the 
post-synaptic  membrane  potential;  they  are  receptor-specific.  The 
chemophore  receptors  combine  with  different  transmitters  to  cause 
metabolic  changes  in  the  follower  cell.  The  actions  of  these  receptors 
are  transmitter-specific  (Fig.  1—4).  The  demonstration  that  neurons 
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PRE-SYNAPTIC  BOUTON  POST-SYNAPTIC  RECEPTORS 

Figure  1-4.  Transmitter-specific  receptors. 

affect  one  another’s  metabolic  as  well  as  electric  states  has  brought  to 
light  an  entirely  new  dimension  in  interneuronal  communication. 

The  most  striking  aspect  of  the  neuron  is  its  multiinput,  single¬ 
output  character.  The  slow  potential  on  the  soma  apparently  indicates 
a  comparison  of  dendritical  inputs  on  the  basis  of  the  temporal,  struc¬ 
tural,  and  qualitative  nature  of  the  synaptic  input  that  results  in  the 
all-or-none  decision  to  fire.  Whether  this  comparison  is  solely  a  function 
of  electrical  interactions,  or  reflects  molecular  conformations  of  the 
membrane  (Barondes,  1970),  or  is  also  modified  by  some  mechanism 
inside  the  cell  is  an  open  question. 

Biological  Memory.  Of  fundamental  importance  to  any  system  Aat 
wishes  to  modify  its  behavior  on  the  basis  of  experience  is  an  efficient 
memory  storage  and  retrieval  system. 

It  appears  that  there  are  multiple  stages  in  the  development  of  a 
memory  and  its  means  of  retrieval.  Demonstration  of  how  memory 
might  function  has  come  from  psychological  and  biological  experimenta¬ 
tion,  clinical  observations  of  memory  dysfunctions,  and  attempts  to 
mimic  the  structure  and  function  of  human  memory  by  computer 
simulation.  These  differing  approaches  to  the  study  of  memory  have 
caused  some  confusion,  especially  in  the  meaning  of  such  terms  as 
short-  and  long-term  memory.  As  will  be  seen  below,  caution  should 
be  used  in  interpreting  what  an  author  means  by  such  terms. 

Psychologists  have  experimentally  identified  three  types  of  mem¬ 
ory.  Sensory  information  storage  (sis)  is  measured  in  tenths  of  a  second. 
It  serves  to  retain  fleeting  sensory  data  until  the  central  nervous  system 
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(CNS)  can  process  it.  The  sis  system  results  in  the  after-images  you 
see  when  rapidly  opening  and  closing  your  eyes.  The  sis  retains  more  data 
than  the  central  nervous  system  can  process  during  the  short  duration 
of  the  sis  trace.  The  cns  rapidly  scans  the  sis  trace  and  retains  that 
data  of  most  interest  to  the  perceiver. 

Short-term  memory  (stm)  as  determined  by  various  psychological 
experiments  lasts  about  30  seconds.  A  subject  asked  to  remember  three 
words  for  18  seconds  does  so  with  ease.  However,  if  asked  to  rapidly 
subtract  3  s  from  a  randomly  assigned  number  during  the  intervening 
18  seconds,  and  then  asked  to  recall  the  words,  most  subjects  will  not 
remember  them.  It  is  believed  that  the  serial  subtractions  interfere  with 
any  subvocalized  rehearsal  of  the  words  and  with  the  consolidation  of  the 
three  words  to  long-term  memory  (ltm)  .  If  a  subject  is  given  a  series  of 
30  words  at  the  rate  of  one  per  second,  and  asked  to  recall  them  im¬ 
mediately  afterward,  he  remembers  the  beginning  and  end  of  the  list 
best.  If  he  is  asked  to  subtract  serial  3’s  immediately  after  seeing  the 
list,  the  tail  end  of  the  curve  disappears.  Here,  then,  is  a  demonstration 
of  which  parts  of  the  learning  curve  are  a  function  of  long-term  as  op- 
posed  to  short-term  memory. 

A  major  part  of  the  psychologist’s  investigation  of  long-term 
memory  has  centered  around  the  use  of  computer  simulations;  much 
of  this  will  be  covered  later.  For  an  excellent  overview  of  the  psycholo¬ 
gist’s  approach  to  memory  and  mind  functioning,  see  Lindsay  and 
Norman  (1972). 

When  an  individual  suffers  a  fairly  hard  blow  to  the  head,  he 
often  cannot  remember  events  immediately  preceding  his  accident.  This 
phenomenon  is  called  retrograde  amnesia;  it  may  begin  with  loss  of 
memory  for  several  hours  or  days  prior  to  the  accident.  The  earlier 
memories  usually  return  first,  followed  by  the  later  until  Only  events 
30  to  60  seconds  prior  to  the  trauma  cannot  be  remembered  (Jarvik, 
1972).  Memories  following  the  accident  {anterograde  memory)  are 
likewise  impaired  and  are  more  refractory  to  recovery.  Such  phenomena 
have  also  been  noted  in  psychiatric  patients  who  undergo  electro¬ 
convulsive  shock  therapy  (ecs).  These  observations  fostered  the  idea 
that  short-term  memory  traces  were  transient  electric  events  that 
eventually  consolidated  into  long-term  memories  through  chemical  and 
biological  changes  in  the  brain. 

Normally,  a  rat  placed  on  a  pedestal  in  a  eage  with  an  electrified 
grid  floor  needs  only  one  or  two  trials  to  learn  not  to  jump  down  from 
the  pedestal.  Howeve/,  if  ecs  follows  these  learning  trials,  the  rat  will 
not  learn  to  avoid  either  the  electrified  grid  or  the  added  negative  ex¬ 
perience  of  going  through  ecs  (Deutsch,  1969).  The  longer  the  interval 
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between  the  administration  of  the  learning  task  and  the  ecs,  the  smaller 
the  elfect  on  long-term  retention.  However,  investigators  differ  on 
how  long  after  the  learning  trials  the  ecs  is  effective  in  preventing  long¬ 
term  retention.  Some  say  that  ecs  is  not  effective  after  15  to  30  seconds, 
whereas  others  claim  that  ecs  will  impair  long-term  retention  when  given 
hours  after  the  learning  trials. 

Drugs  that  inhibit  protein  synthesis  when  given  before  learning 
trials  do  not  impair  learning,  or  the  retention  of  that  learning,  for  as 
long  as  3  to  6  hours.  Testing  after  6  hours,  however,  shows  a  marked 
loss  of  memory.  If  the  drug  is  given  shortly  after  the  learning  task, 
memory  is  not  inhibited.  These  results  could  suggest  a  dual  trace 
theory  of  memory.  Short-term  memory  and  long-term  memory  would  be 
separate  processes;  the  former  lasting  up  to  6  hours  after  the  learning 
trial,  the  latter  being  initiated  during  learning  and  not  susceptible  to 
protein  inhibition  only  a  few  minutes  after  the  learning  trial  (Barondes, 
1970)  .  The  duration  of  this  “short  term”  memory,  however,  is  a  func¬ 
tion  of  how  well  the  animal  is  trained,  suggesting  that  the  protein 
inhibitors  might  simply  be  weakening  the  long-term  trace  that  has  been 
derived  from  a  short-term  trace. 

Puromycin,  which  inhibits  protein  synthesis  and  has  various  other 
central  nervous  system  effects,  can  cause  retrograde  amnesia  when 
given  up  to  several  days  following  a  learning  situation.  Normal  saline, 
injected  into  the  same  place  as  the  puromycin,  can  reverse  these  effects 
and  restore  the  memory.  It  has  been  suggested  that  puromycin  may 
disrupt  the  retrieval  rather  than  the  storage  of  information  (Jarvik, 
1972). 

The  plethora  of  experiments  dealing  with  ecs  and  drug  effects  on 
memory  have  resulted  in  a  confused,  controversial,  and  often  contra¬ 
dictory  literature  that  is  well  reviewed  by  Deutsch  (1969)  and  Jarvik 
(1972).  Perhaps  the  most  reasonable  hypothesis  of  the  moment  is  the 
following:  The  short-term  memory  reported  by  clinicians,  psychologists, 
and  some  investigators  to  last  about  30  seconds  is  indeed  a  transitory 
electrical  reverberation  that  is  consolidated  into  a  more  durable  long¬ 
term  memory.  However,  the  strength  and  accessibility  of  this  long-term 
memory  is  quite  variable  and  is  a  function  of  the  number  of  retrieval 
traces  laid  down  during  learning  and  of  the  use  of  old  retrieval  traces 
and  the  construction  of  new  retrieval  traces  to  the  long-term  memory 
after  the  initial  learning  trial. 

Clinical  observations  have  localized  the  hippocampus  as  that  part 
of  the  brain  responsible  for  the  consolidation  of  memories.  In  the  case 
of  Henry  M.  (Barbizet,  1970),  complete  surgical,  bilateral  ablation  of 
the  hippocampi  prevented  him  from  learning  anything  new  following 
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his  operation.  There  was  no  change  in  iq,  no  loss  of  preoperational 
memories,  and  no  abnormality  in  his  ability  to  recall  digits  immediately 
after  hearing  them.  His  crippling  deficit  involved  an  inability  to  recall 
anything  that  had  happened  earlier  than  a  minute  before  the  present 
or  later  than  the  day  of  his  operation. 

A  similar  dysfunction  is  part  of  Korsikoff’s  syndrome,  seen  in 
chronic  alcoholics.  Here  the  pathology  seems  to  affect  the  mammillary 
bodies,  the  dorsal  thalamus,  and  the  terminal  fornix- — areas  of  the 
brain,  w^hich  along  with  the  hippocampus,  form  part  of  the  limbic  sys¬ 
tem.  This  system  also  is  the  center  for  innate  emotions,  feelings,  and  the 
regulation  of  hunger,  thirst,  rage,  and  sexual  activities  (Pribram,  1971). 
Patients  with  Korsikoff’s  syndrome  will  frequently  be  unable  to  remem¬ 
ber  anything  that  occurred  during  the  course  of  their  disease  and  will 
confabulate  these  memories  if  questioned.  However,  it  appears  that 

they  do  retain  long-term  memories  (Barbizet,  1970). 

Pathological  dysfunctions  in  long-term  memory  such  as  Alzheimer’s 
disease  or  senile  dementia  do  not  appear  to  be  localized,  but  consist  of 
diffuse  damage  throughout  the  cortex.  Terminal  Alzheimer’s  and  severe 
dementia  leave  the  patient  completely  unable  to  learn,  communicate, 
and  function  or  care  for  himself. 

These  clinical  studies  have  demonstrated  that  long-term  memory 
stores  are  much  less  susceptible  to  damage  than  is  the  consolidating 
process.  This  is  expressed  in  the  general  maxim  that  anterograde  mem¬ 
ory  loss  is  nearly  always  greater  than  retrograde  memory  loss. 

The  hippocampus  appears  to  act  as  the  “store”  mechanism  for  the 
brain.  It  is  interesting  that  this  function  is  integrated  with  parts  of  the 
brain  which  attach  emotional  weight,  pleasure  or  pain,  to  external  per¬ 
ceptions.  Perhaps  such  emotive  interest  is  necessary  to  activate  the 
consolidation  of  a  short-term  percept. 

The  search  for  the  “engram,”  the  biological  material  that  is  a 
memory,  was  initiated  by  Lashley  in  1929.  He  would  train  animals  to 
a  task,  surgically  ablate  well-defined  areas  of  the  cortex,  and  see  if  the 
animal  still  was  able  to  perform  the  task.  He  found  that  long-term 
memories  were  very  difficult  to  destroy.  He  might  destroy  up  to  80% 
of  an  animal’s  visual  cortex,  and  still  the  animal  would  retain  the  visual 
discriminations  it  had  learned.  From  his  studies  it  appears  that  a  long- 
I®™  ™f™«ry  tracers  diffusely  spread  throughout  a  significant  portion  of 

Assuming,  quite  simplistically,  that  memories  of  certain  “percepts” 
might  be  localized  to  specific  cells  or  association  networks  of  cells,  then 
long-term  learning  would  take  place  when  any  two  of  these  percepts 
were  temporally  paired  (as  in  conditioning  experiments).  Considering 
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the  large  number  of  interconnections  between  neurons,  one  might 
postulate  that  learning  is  the  facilitation  of  preexisting  synapses,  per¬ 
haps  through  an  increase  in  transmitter  receptors  at  the  post-synaptic 
membrane  or  in  transmitter  substance  in  the  presynaptic  bouton.  Long¬ 
term  learning  could  also  be  the  growth  of  new  connections  between 
neurons  or  association  networks,  directed,  perhaps,  by  some  neural 
growth  factor  excreted  only  by  excited  neurons. 

Though  it  has  been  rather  conclusively  shown  that  adult  neural  cells 
do  not  reproduce,  anatomical  studies  have  shown  that  neural  lesions  are 
sometimes  “repaired”  by  the  growth  of  the  dendritic  and  axonal  net¬ 
works  of  the  remaining  cells  (Rose  et  al.,  1969)  .  It  has  also  been  shown 
that  (here  are  consistent  differences  in  the  brains  of  rats  placed  in  a 
stimulating  environment  with  other  rats  and  various  toys  and  in  rats 
placed  in  an  impoverished  environment  where  they  are  isolated  and  have 
little  stimulation.  The  former  have  thicker  cortices,  heavier  occipital 
cortices,  larger  neutral  cell  bodies  and  nuclei,  more  dendritic  spines, 
larger  synaptic  junctions,  an  increase  in  acetylcholine,  and  a  greater 
number  of  glial  cells  (support  cells  for  the  neurons)  (Rosenzweig 
et  al.,  1972)  .  The  changes  show,  for  the  first  time,  that  experience  re¬ 
sults  in  measurable  brain  alterations,  but  the  behaviors,  and  the  changes 
they  caused,  are  too  general  to  demonstrate  underlying  mechanisms, 
though  they  are  consistent  with  both  the  synaptic  facilitation  and  neural 
growth  hypotheses. 

Perhaps  the  most  outstanding  example  of  inforniation  storage  in 
nature  is  the  dna  molecule  that  encodes  all  the  information  necessary 
for  the  construction  of  an  entire  organism  within  the  structure  of  mole¬ 
cules  that  weigh  about  10"""  gram  (Watson,  1970);  It  has  been  sug¬ 
gested  that  memories  may  be  stored  in  a  like  fashion  m  dna  or  rna 
(the  chemical  that  transfers  the  dna  message  throughout  an  individual 
cell  and  regulates  the  production  of  cellular  proteins).  Some  research¬ 
ers  claim  that  rna  or  proteins  transferred  from  animals  conditioned  to 
a  certain  task  helps  naive  animals  learn  the  task  faster.  However,  no 
one  has  yet  reproducibly  demonstrated  that  rna  or  more  than  a  few 
small,  specific  proteins  can  cross  the  mammalian  brain’s  blood-brain 
barrier  (Pribram,  1971). 

Hyden  (1969)  taught  rats  to  balance  on  a  wire  and  then  examined 
for  changes  in  rna  that  part  of  the  brain  that  control^  balance.  He 
found  that  stimulated  brain  cells  produced  more  rna  than  any  other 
tissue  in  the  body.  He  also  found  that  the  type  of  rna  being  produced 
had  qualitatively  changed.  After  stimulation,  the  rna  in  the  neural  cells 
decreased,  but  there  was  a  consonant  increase  of  rna  in  the  neurons’ 
glial  cells  similar  to  the  rna  that  had  been  produced  in  the  neurons. 
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Evidently,  learning  causes  changes  in  neurons,  and  the  implementation 
of  such  a  change  in  a  cell  necessarily  involves  the  production  of  more 
and  different  rna.  The  temporal  contiguity  of  the  disappearance  of 
neural  rna  and  the  appearance  of  similar  glial  rna  is  provocative,  but 
the  experiments  are  still  controversial  and  the  significance  of  the  results 
unclear. 

Although  we  have  some  idea  of  how  memory  is  stored,  how  it  is 
structured  in  storage,  and  how  it  is  retrieved,  we  have  little  idea  of  the 
biological  correlates  of  these  processes.  Work  with  simplified  neural 
systems  such  as  those  in  Aplesia  holds  much  promise  for  elucidating 
the  biochemical  dynamics  that  accompany  new  learning. 

Neural  Data  Processing  at  the  Gross  Anatomical  Level.  As  in 
the  case  of  the  hippocampus  and  the  limbic  region,  neurologists  have 
ascribed  general  and  even  quite  specific  data  processing  functions  to 
gross  regions  of  the  brain  through  the  careful  testing  of  patients  with 
defined  forms  of  brain  damage.  A  specific  example  of  this  approach  is 
the  identification  of  the  respective  functions  of  the  right  and  left  cerebral 
cortices  (Gazzaniga,  1970). 

Nearly  all  people,  excepting  15%  of  the  left-handers,  are  left- 
dominant  for  speech  (that  is,  the  left  hemisphere  of  the  brain  is  re¬ 
sponsible  for  their  capability  to  hear,  understand,  and  speak  language). 
It  is  well  known  that  the  left  hemisphere  deals  with  the  motor  and 
sensory  functions  of  the  right  side  of  the  body,  and  vice  versa.  In  man 
this  is  true  for  eyesight,  where  the  left  side  of  the  brain  sees  the  right 
visual  field  (those  objects  to  your  right),  and  vice  versa  (Fig.  1-5).  The 
corpus  callosum  is  a  thick  sheet  of  neural  fibers  that  is  the  sole  source  of 
communication  between  the  right  and  left  hemispheres  (Fig.  1-6).  In 
patients  with  severed  corpus  callosums  the  separate  functions  of  the 
two  hemispheres  can  be  studied  by  presenting  visual  data  to  either  the 
left  or  right  visual  fields,  or  tactile  data  to  either  the  left  or  right  hands. 
A  word  presented  to  the  left  visual  field  cannot  be  vocalized  by  such  a 
subject  because  the  image  is  perceived  by  the  right  hemisphere.  How¬ 
ever,  if  the  word  is  banana”  and  the  left  hand  (also  controlled  by  the 
right  hemisphere)  must  choose  between  a  number  of  objects  that  can¬ 
not  be  seen,  the  left  hand  invariably  chooses  the  banana.  Thus,  the  word 
has  been  perceived  and  translated  by  the  right  hemisphere  into  an  ap¬ 
propriate  motor  action,  though  the  instructions  and  the  word  “banana” 
have  not  been  consciously  heard  and  the  subject  has  no  idea  of  what 
he  did. 

Thus,  we  have  two  brains — one  conscious  in  the  sense  that  it  can 
hear,  understand,  and  repeat  back  what  is  said  to  it;  and  one  that  re¬ 
acts  to  stimuli  and  performs  activities  that  we  will  not  be  aware  of 
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Figure  1-5.  How  the  left  side  of  the  brain  sees  the  right  visual  field 

(and  vice  versa). 
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Figure  1-6.  The  corpus  callosum. 


unless  our  corpus  callosum  is  intact.  It  has  been  found  that  the  left 
hemisphere  is  normally  superior  to  the  right  in  speaking,  writing, 
calculating,  and  solving  maze  problems.  The  right  is  superior  to  the  left 
in  three-dimensional  drawing  and  singing. 

Similar  studies  have  shown  that  there  are  rather  discrete  areas  of 
each  cortdx  for  visual,  auditory,  olfactory,  gustatory,  and  somatic  per¬ 
ceptions  and  secondary  processing.  In  addition,  “association  areas” 
have  been  identified,  which  integrate  the  various  sensory  modalities. 
For  instance,  the  ablation  of  Wernicke’s  area  results  in  the  subject’s 
inability  to  repeat  words  he  reads  or  hears  and  to  emit  meaningful 
sentences.  Instead,  strange  strings  of  nonsense  phrases  and  words 
are  spoken.  It  is  believed  that  destruction  of  this  area  disassociates 
the  thinking,  hearing,  and  seeing  portions  of  the  cortex  from  the  area 
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that  converts  thoughts  into  the  motor  actions  that  lead  to  speaking. 
Ablations  in  Broca’s  area  cause  aphasia — the  nearly  complete  inability 
to  speak  any  words  even  though  the  patient  can  still  write  his  com¬ 
munications  in  a  normal  fashion  (Geschwind,  1972). 

While  certain  functions  have  been  rather  discretely  localized,  other 
tasks,  such  as  the  ability  to  recognize  simple  figures  hidden  in  more 
complex  figures,  seem  to  be  a  function  of  how  much  material  has  been 
lost  from  any  or  all  portions  of  the  neocortex. 

Neural  Data  Processing  at  the  Cellular  Level.  Digital  computers 
typically  have  certain  built-in  information  processing  functions  for  coding 
and  decoding  input-output  information,  for  the  transferral  of  data  from 
the  storage  units  to  the  general  registers,  and  for  the  handling  of  data  in 
the  general  registers.  Certain  neural  functions  and  organizations  have 
been  discovered  for  data  processing;  these  will  be  discussed  for  the 
particular  case  of  visual  perception  (see  Chapter  5). 

The  retina  of  the  eye  converts  patterns  of  photons  into  more  con¬ 
densed  patterns  of  electric  impulses  in  the  optic  nerve  (there  is  about  a 
tenfold  contraction  of  the  information).  There  are  about  100  million 
rod  and  cone  receptors  in  the  retina.  In  each  cell,  carotene  attached  to 
the  enzyme  rhodopsin  produces  molecular  complexes  sensitive  to  visual 
wavelengths  of  light.  Photons  induce  a  structural  change  in  carotene, 
and  this  change  triggers  a  receptor-cell  voltage  potential  that  is  com¬ 
municated  to  the  “bipolar”  cells,  which  in  turn  innervate  the  ganglion 
cells  of  the  optic  nerve.  The  receptor,  bipolar,  and  ganglion  cells  are 
interconnected  by  amaerine  and  horizontal  cells,  which  regulate  how 
many  and  which  receptor  cells  will  communicate  with  ganglia  cells 
via  the  bipolar  cells.  In  the  macula  densa  portion  of  the  eye  there  is  one 
receptor  cell  for  each  ganglion  cell;  in  the  other  areas  of  the  eye,  up  to 
100  receptor  cells  may  stimulate  a  ganglion  cell  (Fig.  1-7). 

The  electrical  activity  of  all  ganglion  cells  is  greatest  in  the  dark; 
when  exposed  to  light,  the  interconnecting  amaerine  Cells  provide  in¬ 
hibitory  “gates”  that  reduce  the  sensitivity  of  the  surrounding  receptors. 
This  surround  inhibition  is  responsible  for  the  heightened  contrast  one 
sees  at  the  borders  of  two  different  light  intensities.  When  one  looks  at 
the  border  between  light  and  dark  shades,  the  dark  botder  is  darker 
than  the  rest  of  the  dark  shade  and  the  light  border  is  lighter  than  the 
rest  of  the  light  shade.  In  actuality  the  light  intensity  of  each  bards  uni¬ 
form;  the  heightened  contrasts  are  the  result  of  surround  inhibition.  If 
one  postulates  that  stimulated  cells  inhibit  their  neighbors’  rate  of  firing 
to  a  degree  related  directly  to  the  light  intensity  and  inversely  to  the 
distance  from  the  neighboring  cells,  and  if  the  receptors  are  otherwise 
uniformly  stirnulated  by  the  incoming  light,  then  it  follows  that  those 
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Figure  1-7.  Surround  inhibition. 


cells  exposed  to  the  lighter  band  and  near  the  border  will  be  stimulated 
as  much  as  their  fellow  “light”  cells.  However,  they  will  be  inhibited 
less  by  their  neighbors  because  some  of  their  neighbors  are  “dark” 
cells  which,  because  they  are  stimulated  less  by  the  dark  band,  will  in¬ 
hibit  their  neighbors  less.  Conversely,  the  dark  cells  near  the  border 
will  be  inhibited  more  than  their  fellow  dark  cells  because  some  of 
their  neighbors  are  the  light  cells,  which  inhibit  their  neighbors  more 
than  dark  cells.  Surround  inhibition  (more  thoroughly  explained  in 
Ratliff,  1972)  is  one  of  the  fundamental  informational  processing  path¬ 
ways  used  throughout  the  mammalian  brain  as  well  as  in  the  eye. 
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Some  visual  receptors  cause  their  ganglion  to  fire  when  stimulated 
by  light  (“on”),  when  not  stimulated  (“off”),  or  only  when  the  light 
changes  (“on-off”).  All  receptors  fire  rapidly  when  first  stimulated, 
which  helps  to  explain  why  mammals  preferentially  attend  to  moving 
objects.  One  organization  of  the  receptors  in  the  receptive  field  of  a 
single  ganglion  is  the  round  field,  where  the  center  is  on  and  the  periph¬ 
ery  off,  with  an  on-off  interface  between  the  two.  Other  receptive  fields 
are  shaped  so  that  they  respond  preferentially  to  edges,  curves,  and  lines 
(Spinelli,  1966). 

If  any  receptive  field  sees  the  same  image  for  more  than  30  seconds, 
the  bipolar  and  amacrine  cells  adapt  to  the  receptor  stimulation  such 
that  the  ganglion  is  no  longer  stimulated  and  the  object  no  longer  seen. 
Consequently,  the  eye  is  always  moving  so  that  the  receptors  will  not 
see  the  same  image  for  more  than  several  seconds,  though  these  move¬ 
ments  are  normally  very  small.  In  the  central  nervous  system,  this 
mechanism  is  called  habituation^  and  it  allows  the  organism  to  screen 
out  “background”  noises  when  attending  to  a  specific  percept  (Thomp¬ 
son,  1967). 

The  optic  ganglia  form  a  one-to-one  projection  to  the  lateral 
geniculate,  where  colors  are  mixed  and  the  on-off  responses  of  the 
ganglia  are  separated.  A  cell  from  the  lateral  genidulate  may  contact  up 
to  5000  cells  in  the  striate  area  of  the  occipital  cortex  (the  rear  end  of 
the  brain)  where  actual  “seeing”  takes  place.  Hiibel  and  Wiesel  dis¬ 
covered  very  specific  feature  detection  cells  in  the  occipital  cortex  which 
are  arranged  in  what  seems  to  be  an  ascending  hierarchy  of  complexity. 
The  procedure  that  they  and  many  other  investigators  have  used  is 
the  recording  of  induced  responses  by  microelectrodes.  Microelec¬ 
trodes  are  carefully  placed  into  single  neural  cells  in  the  brain.  Then 
the  animal  is  presented  with  very  specific  stimuli  and  the  electrical  re¬ 
sponse  of  the  single  cell  is  recorded. 

“Simple”  cells  respond  to  a  line  at  a  certain  angle  in  a  certain  small 
defined  area  of  the  retina.  “Complex”  cells  will  respond  similarly  to 
a  line,  but  at  any  point  in  a  much  larger  retinal  field.  “Hypercom¬ 
plex”  cells  require  a  ^ven  length  in  addition  to  -a  given  orientation  in 
order  to  fire,  and  “higher  order  hypercomplex”  cells  respond  only  to 
lines  that  form  certain  angles.  Cells  responding  to  lines  at  a  certain 
orientation  are  arranged  in  columns  perpendicular  to  the  surface  of 
the  visual  cortex,  and  groups  of  these  columns  responding  to  all  the 
various  orientations  for  a  certain  area  of  the  retina  are  arranged  to¬ 
gether  (Hiibel  and  Wiesel,  1962,  1963;  Lindsay  and  Norman,  1972). 

The  spatial  arrangement  of  these  cells  and  their  hierirchial  nature 
suggest  a  feature  detection  model  of  visual  perception.  Most  simply, 
this  model  suggests  that  any  percept  is  the  summation  of  the  discrete 
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features  reported  by  each  of  the  receptive  fields  of  the  cortical  neurons. 
The  pandemonium  model”  and  other  models  of  pattern  perception  are 
discussed  in  Chapter  5. 

Motivation.  lAte  forms  are  essentially  chemical  information  proces¬ 
sors  designed  to  preserve  the  chemical  information  that  describes  them 
within  the  gene  pool  of  a  species.  Complex  mammalian  intelligence  is  one 
of  a  variety  of  strategies  that  tends  to  preserve  certain  information. 
Thus,  tendencies  are  built  into  biological  organisms  to  insure  survival; 
for  instance,  the  tendency  to  repeat  behaviors  that  reward  an  individual 
with  food. 

A  great  deal  of  work  is  currently  being  invested  in  finding  out  why 
(1)  mammals,  especially  humans,  do  what  they  do,  and  (2)  the  basic 
biochemical  and  neurophysiological  mechanisms  underlying  motivation 
and  emotion.  This  literature  is  very  extensive  and  will  not  be  reviewed 
here. 

Review.  The  major  questions  concerning  the  nature  of  biological 
intelligence  remained  unanswered.  What  are  the  information  processing 
functions  of  neural  and  glial  cells?  How  do  context,  expectations,  and 
perceived  features  blend  to  make  an  understandable  perception?  How 
do  experiences  become  memories  in  long-term  storage?  What  is  the 
biochemical  substrate  of  memory?  At  what  level  do  perceptions  enter 
consciousness;  when  and  where  do  cortical  electricity  and  chemical 
transmitters  become  perceived  thoughts?  Artificial  intelligence  will  cer¬ 
tainly  be  a  major  contributor  to  the  answering  of  these  questions. 


COMPUTERS  AND  SIMULATION 

Before  concluding  the  discussion  of  the  first  and  second  questions 
cited  at  the  start  of  this  chapter,  some  mention  should  be  made  of  the 
basic  technique  used  in  ai  research.  One  significant  fact  is  that  it  is  not 
necessary  to  build  a  different  physical  machine  each  time  we  wish  to 
investigate  a  new  machine’s  abilities.  A  kind  of  machine  exists  which  is 
capable  of  accepting  a  symbolic  description  (in  the  form  of  a  program) 
of  any  machine  and  of  simulating  the  machine  described  by  such  a 
program.  The  general-purpose  digital  computers  are  examples  of  this 
kind  of  machine. 

Computers  typically  have  five  main  components:  an  input  unit,  a 
control  unit,  a  logic  unit,  a  storage  unit,  and  an  output  unit.  The  pro¬ 
gram  and  other  data  go  into  the  computer  via  the  input  unit  and  are 
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stored  iri  the  storage  unit,  or  memory,  of  the  computer.  The  logic  and 
control  units  alter  the  information  in  the  storage  unit  of  the  computer 
in  a  manner  that  is  dependent  upon  the  program.  Also  in  a  manner  de¬ 
pendent  on  the  program,  the  control  unit  causes  the  output  unit  to  emit 
information  (e.g.,  punched  cards,  electric  impulses,  printed  paper). 
Computers  can  be  designed  sb  as  to  utilize  a  wide  range  of  input-output 
devices,  from  television  cameras  and  crt  (cathode-ray  tube)  display 
screens  (like  a  television  set)  to  mechanical  arms  and  typewriter-like 
terminals. 

Computers  and  the  notion  of  “simulation”  are  discussed  more 
thoroughly  in  Chapter  2.  Briefly,  a  computer  simulates  something  if  it 
duplicates  that  thing’s  behavior.  The  duplication  does  not  have  to  be 
exact,  nor  does  it  have  to  proceed  at  the  same  rate  as  the  original.  Thus, 
a  computer  is  said  to  simulate  a  person  playing  Chess  if  it  prints  out 
a  possible  move  on  a  sheet  of  paper  whenever  it  is  given  as  input  a 
description  of  a  possible  chessboard  configuration.  We  do  not  require 
that  the  computer  print  out  the  same  move  that  a  given  person  would 
make,  nor  must  the  computer  be  able  to  move  physically  the  pieces  of 
an  actual  Chess  set,  nor  does  the  computer  require  the  same  time  to 
make  its  move  as  a  person  would.  A  simulation  may  be  a  speed-tip 
or  a  “slow-up”  of  the  original.  Likewise,  a  computer  is  said  to  simulate 
intelligence  when  it  does  something  that  a  person  needs  intelligence  to 
when  its  behavior  corresponds  in  some  manner  to  that  of  an 
intelligent  person.  Thus,  the  extent  to  which  a  machine  simulates  intelli¬ 
gence  may  vary.  In  this  book  the  emphasis  is  on  the  ability  of  computers 
to  do  the  things  listed  at  the  start  of  this  chapter. 


NOTES 

1-1.  This  note  cites  some  general  references  on  the  subject  of  artificial 
intelligence.  First,  over  the  past  two  decades  several  authors  have  argued, 
both  pro  and  con,  the  possibility  of  artificial  intelligence;  that  is,  whether 
machines  can  eventually  be  made  to  possess  intelligence  on  a  human  level. 
Some  classic  papers  in  favor  of  the  possibility  are  those  of  Turing  (1947, 
1950)  and  Armer  (1963).  Some  recent  arguments  against  the  possibility  of 
artificial  intelligence  are  those  of  Dreyfus  (1965,  1972)  and  Jaki  (1969);  the 
arguments  of  Dreyfus  are  effectively  refuted  in  the  paper  by  Papert  (1968). 
(One  argument  against  the  possibility  of  ai  that  is  quite  commonly  put  forth 
is :  “Computers  can  do  only  what  they  are  told  to  do.”  This  is  true,  but  no  one 
really  knows  the  limits  of  what  we  can  tell  computers  to  do;  perhaps  we  can 
tell  them  how  to  think,  and  how  to  learn;  see  Armer  and  Turing.)  A  number 
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of  books  besides  this  one  have  been  published  about  artificial  intelligence  or 
about  specific  areas  of  the  subject:  see  Feigenbaum  and  Feldman  (1963); 
Banerji  (1969),  Slagle  (1971),  Minsky  (1968b),  and  Nilsson  (1971). 
(Minsky  [1963,  1966,  1970]  has  also  written  a  number  of  stimulating 
papers  on  artificial  intelligence.)  Two  journals,  Artificial  Intelligence  and 
Pattern  Recognition,  regularly  publish  papers  that  are  of  interest  to  the 
Ai  researcher.  The  (voluminous)  Proceedings  of  the  International  Joint 
Conference  on  Artificial  Intelligence  contains  many  important  papers:  to 
date,  the  ijcai  has  been  held  twice,  in  1969  and  1971,  and  the  proceedings 
of  each  conference  have  been  published.  Papers  on  artificial  intelligence 
may  also  be  found  in  the  Journal  of  the  Association  for  Computing  Ma¬ 
chinery  (jacm),  the  Communications  of  the  Association  for  Computing 
Machinery  (cacm),  and  the  Proceedings  of  the  Spring  and  Fall  Joint  Com¬ 
puter  Conferences  (sjcc  and  fjcc)  of  the  American  Federation  of  In¬ 
formation  Processing  Societies  (afips).  Finally,  a  series  of  volumes  en¬ 
titled  MacAme  Intelligence  include  many  important  papers.  Information 
about  these  books  and  journals  is  provided  in  the  Bibliography. 

1-2.  This  text  uses  the  phrases  “human  intelligence”  and  “intelligence  on 
a  human  level”  somewhat  loosely,  without  really  attempting  to  define  the 
word  human.”  In  other  books  it  is  sometimes  used  as  though  it  might 
apply  only  to  the  species  homo  sapiens;  at  other  times  it  is  used  as  though 
it  might  apply  to  other  animals.  How  “human”  is  an  ant,  a  cat,  a  dog,  a 
dolphin?  If  the  author  were  asked  to  venture  an  opinion,  he  would  prob¬ 
ably  say  that  the  word  “human”  refers  to  a  kind  of  relationship  that  can 
exist  in  the  interaction  of  intelligent  beings.  This  relationship  helps  deter¬ 
mine  their  behavior  toward  each  other,  toward  other  beings  and  objects, 
and  (perhaps  necessarily)  toward  themselves.  Cats  and  dogs  often  par¬ 
ticipate  in  this  relationship,  and  so  are  partly  “human.”  Dolphins  may  con¬ 
sider  themselves  to  be  very  “human,”  as  may  any  creatures  from  outer 
space  that  we  might  someday  happen  to  meet,  and,  conceivably,  it  may  be¬ 
come  conventional  to  think  of  some  machines  as  “human.”  (See  the 
Exercises  for  this  chapter;  also  see  Chapter  9.) 

The  area  of  research  that  attempts  to  simulate  the  underlying 
processes  involved  in  natural  intelligence  is  known  as  simulation  of  cogni¬ 
tive  processes,  (See  various  entries  of  Computers  and  Thought  in  the 
Bibliography,  cited  as  CT,  for  some  introductory  and  early  papers.)  The 
coverage  in  this  book  is,  again,  primarily  concerned  with  the  extent  to 
which  machines  can  simulate  the  abilities  of  natural  intelligence;  only 
secondarily  is  the  simulation  of  cognitive  processes  considered.  However, 
it  should  be  pointed  out  that  some  ai  researchers  view  their  work  as  being 
directed  toward  both  goals — the  subjects  are  certainly  not  mutually  exclu¬ 
sive.  Also,  for  the  sake  of  exposition,  we  shall  occasionally  describe  the 
processes  used  by  intelligent  machines  in  “personalistic”  or  “mentalistic” 
terminology,  as  though  they  were  really  similar  to  the  cognitive  processes 
used  by  people  (or  more  exactly,  as  though  they  were  similar  to  the  cogni- 
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tive  processes  that  people  often  describe  as  being  the  ones  they  use.  I  just 
had  an  idea,”  “My  model  didn’t  include  that,”  “That  was'my  concept  also,” 
“I’ve  got  a  plan.”  See  the  discussion  in  note  7-1. 

1-4.  Turing’s  test  is  discussed  in  greater  detail  in  the  paper  by  Colby, 
Weber,  Hilf,  and  Kraemer  (1971). 


EXERCISES 

i-i.  Read  Descartes  and  see  if  you  can  determine  whether  he  thought  machines 
could  reproduce  themselves. 

7-2.  Two  other  introspective  philosophers  were  Montaigne  and  Pascal.  What 
do  you  think  their  attitudes  would  have  been  toward  artificial  intelligence?  How 
about  Jefferson,  Marx,  Archimedes,  and  Einstein? 

What  do  you  think  intelligence  is? 


The  Archimedean  sunflower. 


MATHEMATICS, 

PHENOMENA, 

MACHINES 


INTRODUCTION 

This  chapter  investigates  in  detail  some  of  the  mathematical  back¬ 
ground  applicable  to  artificial  intelligence.  (The  reader  who  wishes  to 
commence  the  study  of  artificial  intelligence  research  itself  should  turn 
to  Chapter  3 . )  It  presents  a  somewhat  condensed  discussion  of  automata 
theory,  the  branch  of  mathematics  dealing  with  the  nature  of  machines, 
since  the  way  in  which  mathematics  can  be  used  to  describe  the  oper¬ 
ation  of  machines  is  essentially  the  way  it  can  be  used  to  describe  natural 
phenomena  in  general.  Thus,  automata  theory  is  a  foundation  for 
artificial  intelligence  (ai)  research.  It  helps  define  the  generality  of  a 
study  that  relies  on  computer  programs  to  describe  the  phenomenon 
of  intelligence. 

In  addition  to  discussing  machines,  the  nature  of  mathematics  it¬ 
self  will  be  discussed,  with  reference  to  the  question,  “Are  there  some 
things  mathematics  cannot  describe  completely?”  It  is  argued  in  an  in¬ 
formal,  yet  mathematical  way  that  the  answer  is  yes.  There  are  limi¬ 
tations  in  the  method  of  artificial  intelligence  research  because  it  is  based 
(as  is  all  science)  on  mathematics  and  the  capacities  of  mathematical 
descriptions.  These  limitations  say  nothing  definite  about  whether  Ai 
research  will  succeed,  only  that  it  might  not.  The  final  discussion  con- 
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siders  some  very  specific  limits  to  the  computational  abilities  of  ma- 
chines. 


ON  MATHEMATICAL  DESCRIPTION 

A  mathematical  description  of  something  consists  of  a  finite  set  of 
statements  (axioms)  that  utilize  a  finite  set  of  undefined  terms,  to¬ 
gether  with  a  finite  set  of  rules  that  govern  the  derivation  of  new  state¬ 
ments  from  the  axioms  and  from  previously  derived  statements.  Such  a 
collection  of  statements  is  called  a  mathematical  systerrii  or  theory,  and 
the  concept  is  that  any  statement,  either  given  or  derivable,  is  a  true 
statement  concerning  the  thing  described  by  the  theory.  A  mathematical 
theory  may  thus  enable  one  to  use  a  finite  number  of  statements  to 
describe  something  about  which  an  infinite  number  of  statements  (those 
derivable  under  the  theory)  are  true. 

For  example,  the  mathematical  theory  of  Euclidean  geometry  gives 
us  certain  axioms  or  postulates  concerning  the  undefined  concepts  of 
“point,”  “line,”  “plane,”  “between,”  etc.;  the  “thing”  described  by  this 
theory  is  a  geometry,”  consisting  of  interrelationships  existing  among 
lines,  points,  planes,  circles,  spaces,  etc. 

The  ingredients  of  a  mathematical  theory,  then,  are  the  following: 

1.  A  set  of  basic  words  (e.g.,  “point,”  “line,”  “between,”  “dis¬ 
tance,”  “x,”  “y,”  “not,”  “implies,”  “for  all,”)  that  refer  to 
different  objects,  relations  between  objects,  variables,  logical 
connectives,  quantifiers,  and  so  on.  These  are  the  undefined 
words  or  symbols  of  the  theory. 

2.  A  set  of  basic  sentences  made  of  these  basic  words.  These 
basic  sentences  are  the  axioms  or  postulates  of  the  theory. 

3.  A  set  of  logical  rules,  also  made  of  these  basic  words,  that 
tells  us  how  to  derive  new  sentences  from  the  ones  we  are 
given. 

Now,  it  is  the  essence  of  mathematical  theories  (note  2-1)  that 
each  of  these  sets  be  finite;  the  object  described  by  the  theory  may  be 
infinite,  but  the  theory  that  describes  it  must  be  finite.  In  other  words, 
the  fact  that  there  is  a  mathematical  way  of  describing  some  object 
means  that  it  is  finitely  describable. 

This  does  not  imply  the  converse,  that  if  a  thing  is  finitely  de¬ 
scribable  it  is  therefore  mathematically  describable.  It  would  take  us 
too  far  afield,  however,  to  consider  this  converse  proposition  (known 
as  Church’s  thesis,  or  Turing’s  thesis)  in  detail  (note  2-2).  Since  our 
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interest  is  in  mathematics  and  science,  henceforth  consider  the  phrases 
“finitely  describable”  and  “mathematically  describable”  to  be  synony¬ 
mous. 

A  mathematical  description  of  something  is  thus  a  possibly  infinite 
yet  finitely  describable  set  of  sentences,  each  of  which  states  something 
about  the  thing  being  described.  If  the  thing  (note  2— 3)  is  infinite  and 
yet  finitely  describable,  then,  intuitively,  there  are  “patterns  which  hold 
throughout  the  thing,  and  these  patterns  form  the  basis  of  our  mathe¬ 
matical  description.  Thus,  the  Frontispiece  figure  to  this  chapter  shows 
a  collection  of  dots  which  could  be  infinitely  extended.  The  entire  col¬ 
lection,  so  extended,  would  be  an  “infinite  thing.”  Yet  the  entire  col¬ 
lection  can  be  finitely  described  (see  Exercise  2—1)  because  a  ‘  pattern 
exists  in  the  placement  of  the  dots. 

However,  the  simple  existence  of  patterns  in  something  does  not 
guarantee  that  the  thing  is  finitely  describable:  There  may  be  an  infinite 
number  of  patterns,  none  of  which  can  be  predicted  from  the  others, 
each  pattern  adding  its  own  infinite  set  of  parts  (“dots”)  to  the  thing. 

So  there  are  three  possibilities  that  may  hold  if  we  are  asked  to 
describe  something  in  a  mathematical  way:  The  thing  may  be  finite,  in 
which  case  presumably  it  is  finitely  describable  (note  2-4);  the  thing 
may  be  infinite  and  yet  finitely  describable;  the  thing  may  be  infinite 
and  not  finitely  describable. 

If  the  third  possibility  is  the  one  that  actually  holds,  then  in  fact 
we  shall  never  be  able  to  describe  completely  all  of  the  thing  in  question. 
Rather  we  shall  always  be  making  discoveries  like  “there’s  another  dot 
my  description  doesn’t  predict,”  or  (perhaps)  “oops,  there’s  another 
subatomic  particle.  .  . 

As  an  indication  (note  2—5)  that  there  may  be  some  things  that 
cannot  be  finitely  described,  consider  the  following  argument: 

Assume  we  had  a  mathematical  theory  that  would  enable  us  to 
finitely  describe  the  real  numbers;  that  is,  each  sentence  derivable  in 
the  theory  would  be  a  finite  description  of  a  real  number,  enabling  the 
decimal  expansion  of  that  number  to  be  computed  accurately  to  as 
many  places  as  desired.  It  is  the  nature  of  mathematical  theories  as  we 
have  described  them  that  they  may  imply  only  sl  countable  number  of 
statements.  But  the  real  numbers  are  an  uncountable  set.  Thus,  no 
mathematical  theory  could  enable  us  to  derive  a  finite  description  for 
each  real  number;  there  must  always  be  some  real  numbers  that  are 
not  finitely  describable. 

All  this  explanation  is  by  way  of  describing  our  notion  of  mathe¬ 
matical  description.  A  good  example  of  the'  usefulness  of  this  type  of 
description  is  the  scientific  method  itself. 
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The  scientific  method  is  basically  a  way  of  selecting  mathematical 
descriptions  of  the  universe.  To  use  the  method,  one  develops  several 
different  mathematical  descriptions  of  the  known  universe  or  of  some 
part  of  the  known  universe  (some  set  of  “phenomena”  in  the  universe; 
see  the  next  section) :  To  each  of  these  descriptions  there  is  a  corre¬ 
sponding  set  of  predictions  that  it  makes  about  the  rest  of  the  universe; 
one  rejects  those  descriptions  that  can  be  found  by  experiment  to  make 
false  predictions  or  which  make  the  same  predictions  as  do  other  “less 
complicated”  descriptions. 

The  scientific  method  has  had  many  successes  and  therefore  the  use¬ 
fulness  of  making  and  studying  mathematical  descriptions  of  things  is 
well  founded.  Still,  whenever  one  is  called  upon  to  consider  a  previ¬ 
ously  unstudied  phenomenon,  one  cannot  be  entirely  sure  that  it  can 
be  explained  by  the  predictions  of  one’s  current  mathematical  descrip¬ 
tions  of  the  universe.  The  reason  for  this  is  simple:  There  is  no  proof 
(note  2—6)  that  the  universe  is  either  a  finite  or  an  infinite  thing.  If 
one  assumes  it  to  be  an  infinite  thing,  one  can  never  be  sure  in  a  finite 
amount  of  time  whether  mathematical  descriptions  have  been  developed 
to  account  for  all  the  patterns  that  hold  throughout  it. 

With  this  in  mind,  a  person  who  is  concerned  with  developing 
mathematical  descriptions  of  the  real  world  should  understand  that  he 
might  be  engaged  in  an  endless  undertaking.  It  could  be  the  case  that 
there  are  an  infinite  number  of  phenomena  in  the  universe,  none  of 
which  can  be  predicted  from  a  knowledge  of  other  phenomena  in  the 
universe.  It  could  even  be  the  case  that  some  phenomena  in  the  universe 
are  themselves  not  finitely  describable. 

On  the  other  hand,  it  could  be  true  that  the  universe  is  finite,  or  at 
least  finitely  describable. 

What  this  has  to  say  for  our  study  of  intelligence  is  simply  that 
our  success  is  not  guaranteed.  Current  scientific  theories  do  not  all 
describe  the  universe  as  being  finite.  The  caveat  concerning  the  possi¬ 
ble  existence  of  undescribable  phenomena  must  be  heeded:  There  is  no 
scientific  guarantee  that  natural  intelligence  can  be  finitely  described, 
either  by  our  current  scientific  theories  or  by  any  mathematical  de¬ 
scription  that  could  ever  be  developed — ^it  may  simply  not  be  finitely 
describable. 

For  this  reason  we  should  take  care  to  refer  to  the  field  we  are 
studying  as  ‘'artificial  intelligence  research.”  As  will  be  seen  in  subse¬ 
quent  sections,  the  notion  of  “machine”  corresponds  to  that  of  a  “finitely 
describable  phenomenon.”  Since  it  is  an  open  question  whether  natural 
intelligence  is  a  phenomenon  that  can  be  finitely  described,  we  expect 
that  “intelligent”  machines  will  simulate  some  of  the  abilities  of  natural 
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intelligence,  but  whether  they  will  have  them  all  remains  unknown. 
Certainly,  the  evidence  available  suggests  that  intelligent  machines  will 
eventually  have  many  abilities  that  are  currently  limited  to  natural  in¬ 
telligence. 

THE  MATHEMATICAL  DESCRIPTION  OF 
PHENOMENA 

Time 

With  all  the  preceding  conjectures  in  mind,  let  us  see  how  it  is  that 
mathematics  can  be  used  to  describe  “phenomena,”  or  “processes”; 
that  is,  things  that  happen  in  reality. 

First  of  all,  let  us  list  names  for  some  phenomena  that  are  generally 
believed  to  exist.  (See  Exhibit  A.)  These  are  things  people  often  talk 
about  in  the  belief  that  they  happen  in  the  real  world.  Not  all  are  neces¬ 
sarily  things  that  can  be  described  mathematically. 


EXHIBIT  A 

the  playing  of  a  game  chemical  reactions  the  evolution  of 
species  thought  processes  nuclear  reactions  a  person  feel¬ 

ing  emotion  waves  traveling  through  a  medium  cellular 
growth  of  organisms  crystal  formation  sexual  reproduction 
a  candle  burning  a  person  living  a  person  dying  a  stone 
falling  to  the  ground  a  bird  flying  the  motion  of  a  weight  on 
a  spring  the  formation  of  public  opinion  conversion  of 

energy  from  one  form  to  another  dreaming  flipping  a  coin 

the  operation  of  a  computer  program  weather 

To  a  mathematician  looking  at  Exhibit  A,  perhaps  the  most  im¬ 
mediate  thing  he  would  find  common  to  all  its  elements  is  that  each 
element  involves  “time”;  each  of  these  things  may  be  said  to  happen  as 
a  “sequence  of  situations.”^ 

Upon  further  inspection,  the  mathematician  would  discover  that 
each  of  the  phenomena  named  in  Exhibit  A  can  happen  in  a  variety  of 
ways.  For  some  phenomena  the  variety  is  greater  than  for  others.  In 

^An  interesting  and  difficult  open  question  is  whether  the  automata-theoretic 
description  of  phenomena  presented  throughout  this  book  is  in  conflict  with 
relativistic  findings  concerning  simultaneity.  The  reader  interested  in  pursuing 
this  question  is  invited  to  see  Waksman  (1966). 
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his  desire  to  be  general,  he  would  say  that  the  name  of  a  phenomenon 
refers  to  the  set  of  different  ways  in  which  it  can  occur. 

A  third  thing  the  mathematician  might  note  about  Exhibit  A  is  that 
it  is  possible  for  some  phenomena  to  be  made  up  of  others;  this  chapter 
overlooks  ways  of  describing  this  mathematically,^  though  as  an  example, 
it  might  be  noted  that  “cellular  growth  of  organisms”  seems  to  be  made 
up  of  “chemical  reactions.” 

These  observations  are  the  essence  of  the  mathematical  approach 
to  the  description  of  phenomena.  Mathematically,  an  occurrence  of  a 
phenomenon  is  viewed  as  a  sequence  of  situations,  and  the  phenomenon 
itself  is  viewed  as  being  the  set  of  all  possible  ways  it  can  occur.  A 
phenomenon  is  described  by  a  mathematical  theory  of  all  ways  in  which 
it  can  occur;  such  a  theory  might  describe  it  as  either  being  made  up 
of,  or  apart  of,  other  (describable)  phenomena. 

The  first  ingredient  in  the  mathematical  description  of  a  phenome¬ 
non  is  the  specification  of  a  time  scale  T  and  of  a  set  X  of  all  possible 
Mtuations.  We  may  take  T  to  be  some  subset  of  the  real  number  line; 
for  the  moment  we  can  leave  X  unspecified.  If  X  is  the  set  of  all  possible 
situations,  then  an  occurrence  0  is  a  function  that  associates  to  some  of 
the  elements  t  of  T  unique  corresponding  elements  6{t)  from  X,  A 
phenomenon  is  a  set  of  such  functions  {61,829  *  *  ’ }?  c^ch  representing  an 
occurrence.  A  complete  description  of  a  phenomenon  is,  then,  a  descrip¬ 
tion  of  its  possibly  infinite  set  of  occurrences :  The  assumption  that  one 
can  find  a  mathematical  description  for  a  given  phenomenon  is  equiva¬ 
lent  to  the  assumption  that  its  set  of  occurrence  functions  is  finitely 
describable.  Since  some  occurrences  of  a  given  phenomenon  might 
conceivably  possess  an  infinite  number  of  “details”  (say,  in  the  number 
of  times  at  which  situations  are  defined,  or  in  the  number  of  “true  state¬ 
ments”  about  any  particular  situation),  we  may  accept  as  a  finite  de¬ 
scription  any  finite  rule  that  allows  us  to  compute  these  occurrence 
functions  to  an  arbitrary  accuracy.  That  is  to  say,  we  accept  descriptions 
that  are  “effectively”  true.^ 


For  this  reason,  although  the  theory  of  phenomena  outlined  in  this  chapter 
IS  adequate  (to  illustrate  the  limitations  and  generality  of  the  mathematical  ap¬ 
proach),  it  is  not  an  especially  efficient  way  of  describing  anything  other  than 
very  simple  phenomena.  Many  approaches  have  been  made  toward  developing  a 
more  efficient  way  of  describing  complex,  “real-world”  phenomena:  Chapter  8 
discusses  briefly  the  possible  formalizations  for  “parallel  processes”  and  “hier¬ 
archical  systems  ;  in  Chapter  3  there  is  a  brief  discussion  of  logical  systems  for 
describing  real-world  situations  and  their  interrelationships  (causality,  etc.). 

,  ^^Thus,  we  shall  ignore  descriptions  that  describe  “strictly  noncomputable  func¬ 
tions  (see  Exercise  2—3),  but  we  can  accept  descriptions  that  describe  undecid- 
able  systems. 
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EXAMPLE  2^1.  MOTION  OF  A  WEIGHT  ON  A  SPRING.  Itt  the 
simplest  case  of  this  example,  where  the  motion  of  the  weight 
is  entirely  vertical,  one  can  describe  any  possible  situation  by  a 
single  real  number,  representing  how  far  the  spring  is  extended 
or  compressed  from  its  rest  position.  Thus,  the  set  X  of  all 
possible  situations  is  described  by  the  real-number  line.  Which 
particular  occurrence  of  the  phenomenon  happens  is  dependent 
on  such  things  as  the  mass  of  the  weight,  the  damping  factor  of 
the  spring,  the  initial  position  of  the  spring  and  weight,  the 
spring  constant  k,  etc.;  the  graph  of  any  given  occurrence  func¬ 
tion  will  generally  look  like  Fig.  2-1.  The  phenomenon,  or  class 
of  all  possible  occurrence  functions,  can  be  described  by  a 
single  equation  whose  variables  represent  the  factors  given  above. 


Figure  2-1.  An  old  friend  to  the  physics  student. 

The  considerations  presented  in  the  preceding  section,  on  mathe¬ 
matical  descriptions  in  general,  still  hold  for  the  specific  case  of  mathe¬ 
matical  descriptions  of  phenomena.  There  may  be  phenomena  that  are 
not  finitely  describable.  On  the  other  hand,  given  only  finite  sam¬ 
ples  of  the  occurrences  of  a  (possibly  infinite)  phenomenon,  there  is  no 
way  to  prove  that  the  phenomenon  is  not  finitely  describable — ^the  most 
one  can  prove  about  it  is  that  one’s  efforts  to  describe  it  have  so  far 
been  unsuccessful.  (Of  course  one  might  also  prove  that  one’s  efforts 
to  describe  it  have  “so  far”  been  successful.) 
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T ypes  of  Phenomena 

Things  that  happen”  may  often  be  distinguished  from  each  other  by 
the  nature  of  their  occurrence  functions.  One^  of  the  basic  classifications 
defines  three  types  of  phenomena:  discrete ,  nondiscrete,  and  continuous, 
A  phenomenon  is  discrete  iff®  each  of  its  occurrences  is  a  step 
function,  A  function  0  is  a  step  function  iff  it  is  constant  or  undefined 
throughout  any  closed  interval  [t,f]  except  for  a  finite  number  of  “jump 
discontinuities.’  Specifically,  let  [t,f]  be  any  closed  interval  of  T  (pos¬ 
sibly  a  point,  if  /  =  ?') :  Then  there  exist  a  finite  number  of  points  4, 
such  that 

t^h<t2<  ^  •  •  <:^tn^f 

and  0  is  either  constant  or  undefined  on  each  open  subinterval  (4.1,4). 
Figure  2-2  gives  an  example  of  a  step  function. 


3  4  5  6  T 


Figure  2-2.  An  occurrence  of  a  discrete  phenomenon. 


Some  other  classifications  are  determinacy  versus  nondeterminacy  perio¬ 
dicity  versus  nonperiodicity,  etc.  It  should  be  kept  in  mind  that  these  clkssifica- 
10ns  are  rea  y  being  applied  to  descriptions  of  phenomena,  not  to  phenomena 
themselves:  For  example,  certain  phenomena  (electrons)  can  be  described  as 
being  either  discrete  (particles)  or  nondiscrete  (waves). 

®  “Iff”  denotes  “if  and  only  if.” 
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An  equivalent  definition  of  discrete  phenomena  is  the  following: 
Within  an  occurrence  0  define  an  event  to  be  an  interval  of  time  (closed 
or  open  or  semiclosed)  on  which  6  is  constant.  Then  a  discrete  phenome¬ 
non  is  one  such  that  each  of  its  occurrences  can  be  represented  as  a 
sequence  of  events,  in  which  any  event  is  either  ‘  terminal  or  next  to 
another  event.  An  event  is  said  to  be  terminal  if  no  event  follows  it  in 
time.  One  event  is  said  to  be  next  to  another  iff  no  event  occurs  be¬ 
tween  them  in  time. 

A  phenomenon  is  nondiscrete  iff  it  is  not  discrete.  Thus,  a  non¬ 
discrete  phenomenon  has  at  least  one  occurrence  in  which  there  is  a 
situation  that  is  followed  as  closely  in  time  as  one  chooses  to  look  by 
mutually  different  situations. 

A  phenomenon  is  continuous  iff  it  is  nondiscrete,  and  for  any  oc¬ 
currence  (and  for  all  t,  f)  the  difference  between  the  situation  that 
happens  at  time  t  and  the  situation  that  happens  at  t  tends  to  zero  as 
the  difference  between  t  and  tends  to  zero.  This  definition,  of  course, 
is  meaningful  only  in  cases  where  it  is  possible  to  establish  a  definition 
of  “difference”  that  can  be  applied  to  the  possible  situations. 

Throughout  this  book  we  shall  primarily  discuss  discrete  phe¬ 
nomena.  Our  reason  for  this  is  that  by  choosing  the  time  intervals  be¬ 
tween  situations  to  be  suitably  small,  one  can  find  occurrences  of  a 
discrete  phenomenon  that  will  match,  to  an  arbitrary  closeness,  the 
occurrences  of  any  nondiscrete  phenomenon.  Consequently,  if  there 
exists  a  finite  description  for  a  nondiscrete  phenomenon,  then  there  also 
exists  a  finite  description  for  a  discrete  phenomenon  that  approximates 
it  as  closely  as  one  wishes.  (We  can  merely  use  the  nondiscrete  descrip¬ 
tion  to  calculate  the  values  during  the  appropriate  discrete  events.) 

Let  A  =  •  *  *}  be  a  nondiscrete  phenomenon,  and  B  = 

{0i^,02^  •  •  •  }  be  a  discrete  phenomenon.  If  ^  is  continuous,  then  B 
matches  ^  to  a  closeness  8  if  there  exists  a  number  c  greater  than  zero 
such  that,  for  all  t,  f,  if  \t  -  f  \  <  e,  then 

i^/(0 <S 

If  A  is  noncontinuous,  then  B  matches  ^  to  a  closeness  8  if  for  all  t 
such  that 

0,\t)¥=0,\t) 

there  exists  an  €  such  that  H  <  S  and 

0/(/  +  e)  =  Oj^it  +  e) 

It  is  always  possible  to  find  a  discrete  phenomenon  that  will  match  a 
given,  finitely  describable,  nondiscrete  phenomenon.  Similarly,  if  v4  is  a 
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discrete  phenomenon,  then  5  simulates  A,  for  all  i  and  for  all  t,  if 
0/(0  being  defined  implies  that  0/(t)  =  0/(0.  If  B  simulates  A,  then 
the  occurrences  of  A  are  reproduced  exactly  within  the  occurrences  of 
B,  an  occurrence  of  B  may,  however,  contain  situations  that  do  not 
happen  within  the  corresponding  occurrence  of  A.  If  B  matches  A, 
then  Jhe  occurrences  of  B  reproduce  those  of  A  in  an  approximate 
sense®;  thus,  if  B  matches  A,  we  shall  also  say  that  B  “simulates”  A, 
approximately. 

We  shall  see  below  that  it  is  possible  to  construct  a  tool — a  uni¬ 
versal  digital  computer — that  can  reproduce  exactly  the  occurrences  of 
any  mathematically  describable  discrete  phenomenon.  By  suitably  pro¬ 
gramming  a  fast  enough  digital  computer,  one  can  simulate  any  finitely 
describable  phenomenon,  regardless  of  whether  it  is  discrete  or  non- 
discrete  or  continuous.  If  intelligence  is  a  finitely  describable  phenome¬ 
non,  then  it  can  theoretically  be  simulated  on  a  (fast  enough,  big 
enough)  computer. 


Discrete  Phenomena 

The  preceding  section  gave  a  definition  for  discrete  phenomena. 
The  fact  that  there  is  a  sense  in  which  one  can  approximate  any  non¬ 
discrete  phenomenon  to  an  arbitrary  degree,  using  discrete  phenomena, 
gives  sufficient  reason  to  investigate  the  subject  of  finite  (mathematical) 
descriptions  for  discrete  phenomena.  What  we  desire  is  some  way  of 
characterizing  all  such  descriptions.  We  shall  see  that  this  characteriza¬ 
tion  is  provided  by  automata  theory. 

In  this  respect  the  main  thing  to  note  is  that  we  can  describe  any 
step  function  by  a  string,  or  sequence  of  symbols,  provided  we  adopt 
an  appropriate  notation.  Let  us  see  how  this  could  be  done,  using  Fig. 
2-3  as  an  example. 


U  definitions  describe  “real-time”  matching  and  simulation:  They  can 

be  broadened  to  include  notions  of  relative  speed.  Also,  the  equality  sign  can  be 
taken  to  inean  something  like  “is  isomorphic  to.”  However,  it  should  be  noted 
that  even  though  a  discrete  phenomenon  B  may  match  a  nondiscrete  phenomenon 
Ay  m  general  (and  even  if  A  is  continuous)  the  set 

will  be  of  measure  :^ro.  Moreover,  if  ^  is  noncontinuous,  then  the  set  of  points 
(in  time)  at  which  B  will  be  “close”  (within  a  given  5)  to  ^  will  in  general  be 
oi  measure  zero;  an  expression  that  describes  this  set  of  points  is 

{/|  -9iS(0l  <5)} 

Thus,  the  ability  of  discrete  phenomena  (machines)  to  “simulate”  arbitrary  non- 
discrete,  noncontinuous  phenomena  is  somewhat  limited. 
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In  Fig.  2-3,  the  first  event  (note  2-7)  is  the  happening  of  situation 
2,  which  starts  at  ^  =  1  and  ends  at  t  =  2.  Thus,  we  make  the  beginning 
of  our  descriptive  string 

2,l-f-,2+ 

The  next  event  is  the  happening  of  situation  1,  which  occurs  “during  the 
instant”  t  —  2.5.  The  descriptive  string  now  becomes 

2,H“,2+ jl,2.5  +  ,2.5-|- 

And  so  we  continue,  using  minus  signs  whenever  an  event  starts  “im¬ 
mediately  after”  (note  2-8)  some  time  t  or  ends  “immediately  before.” 
The  final  descriptive  string  would  be 

2,H-,2-h,l,2.5 +  ,2.5H-,2,3 +  ,4  ,3,5  ,6+ 

In  general,  any  step  function  can  be  represented  by  such  a  descrip¬ 
tive  string.  If  the  function  is  defined  only  on  a  bounded  time  interval, 
then  its  descriptive  string  will  be  of  finite  length,  even  though  the  total 
number  of  points  for  which  the  function  is  defined  may  be  infinite;  for 
example,  the  descriptive  string  for  Fig.  2-3  is  finite,  although  the  step 
function  is  defined  for  an  infinite  number  of  values  oi  t  (note  2-9). 
Likewise,  if  the  step  function  does  not  have  a  beginning  or  does  not 
have  a  terminal  event,  then  its  descriptive  string  will  be  infinitely  long. 

Since  any  step  function  can  be  represented  by  a  descriptive  string, 
any  set  of  step  functions  can  be  represented  by  a  set  of  descriptive 
strings.  Thus,  to  finitely  describe  the  occurrences  of  a  discrete  phenome¬ 
non,  one  need  only  be  able  to  finitely  describe  a  certain  set  of  strings: 
If  the  set  is  finite,  we  could  simply  list  all  its  strings^  (provided  none  of 
them  is  infinite),  but  what  if  the  set  of  descriptive  strings  is  infinite? 

The  answer  to  this  problem  lies  in  the  following  analysis:  Even 
though  the  set  is  infinite,  we  can  assign  a  number  1,2,  *  •  -  to  each  string 
in  the  set  and  proceed  to  talk  of  the  first  descriptive  string,  the  second 
descriptive  string,  and  so  on  (note  2—10).  Then,  if  we  can  find  a  finitely 
describable  rule  that  computes  for  each  n  the  nth  descriptive  string,  we 
will  in  effect  have  found  a  finite  description  for  the  phenomenon.  Thus, 
we  can  transfer  our  efforts  from  the  finite  description  of  discrete  phe¬ 
nomena  to  the  finite  description  of  functions.  Any  discrete  phenomenon 
is  capable  of  being  represented  as  a  function  that  associates  a  unique 
descriptive  string  to  each  natural  number.  And,  since  any  natural  num¬ 
ber  can  be  represented  by  a  string  (of  finite  length),  we  are  therefore 
concerned  with  finding  finite  descriptions  of  functions  that  map  one  set 

^  In  practice,  there  are  limits  to  the  size  of  finite  sets  that  can  be  enumerated 
(see  note  2~4  and  Chapter  3).  Such  sets  are  called  “finite,  effectively  infinite.” 
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of  strings  (those  representing  natural  numbers)  into  another  set  of 
strings  (those  representing  step  functions). 

The  mathematical  theory  that  deals  with  functions  that  map  one 
set  of  strings  into  another  set  of  strings  is  automata  theory;  a  general 
way  of  characterizing  functions  of  this  sort  is  through  the  use  of  Turing 
machines.  Automata  theory  is  basically  concerned  with  studies  on  the 
nature  of  Turing  machines,  its  underlying  hypothesis®  being  that  this 
is  the  nature  of  all  discrete  machines;  the  concept  of  machine  is  to  be 
identified  with  that  of  '^finitely  describable  phenomenon.**  In  this  chapter 
we  are  concerned  with  some  of  the  simplest  types  of  machines.  Auto¬ 
mata  theory  discusses  the  abstract  nature  of  machines,  but  it  can  include 
such  aspects  of  real-world  machines  as  their  cost  and  probability  of 
error. 

Briefly,  a  Turing  machine  is  composed  of  a  finitely  describable 
black  box  and  an  infinite,  or  potentially  infinite,®  tape  (Fig,  2-3).  The 


Tapehead 

Figure  2-3.  A  Turing  machine. 


tape  is  divided  into  squares,  each  of  which  has  a  symbol  (possibly  the 
“blank”  symbol)  printed  on  it.  The  black  box  contains  two  subcom¬ 
ponents,  a  control  and  a  tapehead;  the  tapehead  is  capable  of  scanning 
and  writing  symbols  on  one  square  of  tape  at  a  time,  and  of  moving  the 
tape  either  to  the  right  or  the  left,  all  under  instructions  given  to  it  by 
the  control.  The  tapehead  sends  to  the  control  the  information  as  to 
what  symbol  it  happens  to  be  scanning,  and  the  control  decides  on  the 
basis  of  that  information  and  a  finite  “memory”  what  actions  it  should 
instruct  the  tapehead  to  perform. 

®  Again,  Church’s  thesis  or  Turing’s  thesis. 

®  By  potentially  infinite  is  meant  that  there  is  someone  nearby  ready  to  add 
more  squares  to  the  tape  if  necessary. 


Mathematics,  phenomena,  machines 

Although  this  may  seem  like  a  very  simple  type  of  machine  with 
very  limited  capabilities,  such  is  not  the  case.  In  fact,  all  evidence  avail¬ 
able  to  date  indicates  that  Turing  machines  are  capable  of  computing 
any  finitely  describable,  computable  function  that  maps  one  set  of  strings 
into  another  set  of  strings.  There  exist  certain  Turing  machines  which, 
given  a  suitable  program,  are  capable  of  simulating  the  computations  of 
any  Turing  machine.  It  can  be  shown  that  a  Turing  machine  can 
effectively  derive  all  provable  theorems  in  any  given  mathematical 
theory.  Indeed,  Turing  machines  are  capable  of  simulating^®  the 
phenomenon  of  self-reproduction.  Therefore  the  rest  of  this  chapter  is 
devoted  to  a  discussion  of  some  results  from  automata  theory. 

Finite-State  Machines 

Of  all  the  elements  of  a  Turing  machine,  the  only  one  that  requires 
mathematical  formalization  is  the  control:  We  need  to  specify  more 
exactly  how  it  is  able  to  make  decisions,  what  its  memory  is,  etc.  We 
now  give  a  general  definition  of  that  class  of  machine  which  may  serve 
as  a  control  in  a  Turing  machine;  the  machines  in  this  class  are  usually 
referred  to  as  finite-state  automata. 

definition  2—1.  A  finite-state  machine ^  or  finite  automaton, 

is  a  quintuple,  M  —  (Q,X,Y,^,X),  where: 

Q  is  a  finite  set,  the  set  of  states; 

Z  is  a  finite  set,  the  set  of  input  symbols; 

Y  is  a  finite  set,  the  set  of  output  symbols; 

8:  <2  X  (2,  the  next  state  function; 

A:  (2  X  Z^  y,  the  next  output  function. 

Any  quintuple  of  sets  and  functions  satisfying  this  definition  is  to 
be  interpreted  as  the  mathematical  description  of  a  machine  that,  if 
given  an  input  symbol  jc  while  it  is  in  state  q,  will  output  the  symbol 
Xiq,x)  and  go  to  state  B(q,x),  (The  two  functions  S  and  A  together  are 
often  referred  to  as  the  transition  junction  of  the  finite  state  machine. ) 

Thus,  a  finite  automaton  is  a  machine  that  can  exist  in  a  finite  set 
of  states,  where  the  particular  state  it  is  in  at  any  given  moment  depends 
upon  the  inputs  it  has  received  and  upon  its  previous  states.  The  set  of 
states  in  an  automaton  serves  as  its  “memory”:  The  only  information 
that  an  automaton  has  concerning  its  past  operation  is  the  current  state 
it  is  in;  at  least,  this  is  the  only  information  it  can  use  in  deciding  its 

Discreetly  .  .  (See  Chapter  8  for  a  discussion  of  self-reproducing  ma¬ 
chines.) 
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next  state  and  its  next  output  when  it  is  given  an  input  symbol.  Some 
examples  would  be  instractive  at  this  point. 

EXAMPLE  2—2.  A  PARITY-CHECKING  MACHINE.  This  machine 
has  only  two  states;  the  machine  will  accept  any  finite  string  of 
zeros  and  ones;  its  output  at  a  given  moment  will  be  the  word 
even  if  the  string  it  has  so  far  received  has  an  even  number 
of  ones,  and  “odd”  otherwise,  provided  it  starts  in  the  “initial 
state”  ^0.  Let  Q  =  {qo,qi},  X  =  {0,1},  Y  =  (“even,”  “odd”}, 
and  define  8  and  A  by  the  following  tables: 


b 

^0 

Ji 

X 

Qi 

0 

^0 

0 

“even” 

“odd” 

1 

^0 

1 

“odd” 

“even” 

For  example,  8(90,!)  =  qu  A(^o,l)  =  “odd”. 

The  reader  should  verify  for  himself  that  this  machine  does  what 
it  is  supposed  to  do,  provided  it  is  started  in  state  qo. 

Actually,  the  use  of  tables  to  define  the  functions  8  and  A  is  rather 
clumsy  and  inefficient;  if  we  were  dealing  with  larger,  more  complicated 
machines,  it  would  be  very  difficult  to  understand  just  what  they  were 
doing.  It  is  customary  to  use  a  certain  type  of  drawing,  called  a  state- 
transition  diagram,  to  describe  a  finite  automaton.  Figure  2-4  gives  a 
state-transition  diagram  for  the  parity  checker. 

In  such  a  diagram  each  state  is  represented  by  a  circle;  each  tran- 


Figure  2-4.  Parity  checker. 
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sition  between  states  is  represented  by  an  arrow;  the  input  symbol  caus¬ 
ing  the  transition  appears  at  the  tail  of  the  arrow,  while  the  correspond¬ 
ing  output  symbol  is  inserted  in  the  middle  of  the  arrow. 

Another  good  example  of  a  finite  automaton  is  a  machine  that  adds 
two  binary  numbers,  provided  they  are  suitably  encoded  into  a  string. 

EXAMPLE  2-3.  A  BINARY  ADDER.  Let  Q  =  {“nocarry,”  “carry”}, 
X  =  {00,01,10,11},  Y  =  {0,1},  and  let  the  functions  8  and  X 

be  given  by  the  state-transition  diagram  in  Fig.  2-5.  To  add  two 

binary  numbers,  say  1101  and  10101  (decimal  13  and  21, 
respectively),  we  first  reverse  them  so  that  they  are  expressed 
with  their  least  significant  digits  first;  1011  and  10101.  Next 
we  add  suflicient  zeros  to  them  to  make  both  strings  be  of  the 
same  length  and  end  in  zero:  101100  and  101010.  Finally,  we 
encode  the  two  strings  into  a  single  string,  whose  symbols  come 
from  the  set  X,  by  taking  the  first  symbols  of  each  string  and 
replacing  them  by  their  corresponding  ordered  pair,  taking  the 
second  symbols  and  doing  the  same,  and  so  On.  The  string  we 
obtain  is 

11,00,11,10,01,00 

If  we  feed  this  string  into  the  binary  adder,  then  the  sequence 
of  outputs  that  we  get  is 
010001 

This  is  the  reverse  of  the  binary  number  100010  =  thirty-four. 
These  two  examples  illustrate  that  finite-state  machines  do  have 
some  computational  ability  and  that  they  can  be  used  in  at  least  two 
slightly  different  ways.  The  first  example  shows  that  it  is  possible  to  use 
an  automaton  as  an  acceptor  for  a  certain  set  of  strings :  If  we  replaced 
its  output  set  by  “true”  and  “false,”  respectively,  then  the  parity  checker 


Figure  2-5.  Binary  adder. 
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would  output  “true”  after  the  input  of  any  finite  string  that  contained 
an  even  number  of  I’s;  that  is,  it  would  accept  the  set  of  all  such 
strings/^  The  binary  adder,  on  the  other  hand,  illustrates  that  we  can 
use  finite-state  machines  to  represent  some  of  the  functions  that  map 
one  set  of  strings  into  another  set  of  strings. 

However,  there  are  many  functions  that  no  finite-state  machine 
can  compute:  One  such  function  is  multiplication.  The  reader  might  try 
his  hand  at  designing  a  finite-state  machine  to  multiply  any  two  num¬ 
bers.  The  basic  reason  it  cannot  be  done  is  that  the  operation  requires 
saving  the  complete  information  about  each  of  the  two  numbers,  and 
this  requires  either  an  infinite  number  of  states  or  an  infinite  tape. 

In  fact,  finite-state  machines  are  only  the  building  blocks  of  auto¬ 
mata  theory;  they  represent  the  simplest  type  of  machine,  one  in  which 
the  future  of  an  occurrence  can  depend  on  only  a  finite  number  of 
different  “past  histories,”  or  states. 

Turing  Machines 

Simple  Turing  Machines 

Let  us  now  return  to  our  original  discussion  of  Turing  machines. 
The  reader  will  recall  that  these  were  described  as  the  most  general  type 
of  discrete  machine;  so  far  as  anyone  knows,  any  function  that  can  be 
computed  can  be  computed  by  a  suitable  Turing  machine. 

DEFINITION  2-2.  A  Turing  machine  {Tm)  is  an  ordered  quin¬ 
tuple,  T  —  {Q,Xj,,P,qQ,F),  where: 

Q  is  a  finite  set  of  states; 

Zfe  is  a  finite  set  of  tape  symbols,  one  of  which  is  the  blank 
symbol  b; 

P  is  the  next-move  function,  a  mapping  from  g  X  Z?,  to  Z  X 
{L,0,R]  X  g  in  which  L,  O,  R  are  symbols  meaning  “go  to 
the  left,”  “stay  at  the  same  place,”  “go  to  the  right”; 

qa  is  an  element  of  Q,  called  the  start  state,  or  initial  state; 

F  is  a  subset  of  g  and  is  called  the  set  of  final  states. 

The  operation  of  a  Turing  machine  begins  with  the  machine  being 
in  ^0  and  examining  the  leftmost  symbol  of  a  string^^  from  Z?,*  that  is 

p-  This  acceptor  is  also  a  decider;  that  is,  it  rejects  those  finite  strings  not  be¬ 
longing  to  the  set  it  accepts. 

^  It  A  is  a.  set,  then  by  *  we  denote  the  set  of  all  finite  strings  whose 
symbols  are  elements  of  Thus,  {a,b}*  =  {€,a,b,ab,ba,aa,bb,aaa,bbb,aba,bab, 
aabybaa,  .  .  .},  where  e  denotes  the  empty  string,  which  does  not  contain  any 
symbols. 
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printed  on  some  of  the  squares  of  its  tape  (every  other  square  of  the 
tape  contains  a  blank  symbol).  The  next-move  function  P  determines 
what  symbol  the  tapehead  prints  on  the  square  it  is  examining,  whether 
the  tapehead  moves  left  or  right  one  square  or  remains  at  the  same 
square,  and  what  state  becomes  the  new  state  of  the  control. 

The  next-move  function  P  can  be  finitely  described,  and  there  is 
no  difficulty  in  considering  the  control  of  the  Turing  machine  to  be  a 
finite-state  machine.  Thus,  the  only  essential  difference  between  Turing 
machines  and  finite-state  automata  lies  in  the  fact  that  a  Turing  machine 
is  able  to  store  its  output  on  a  potentially  infinite  tape  and  refer  to  it 
later.  This  single  difference  (note  2-11)  is  enough  to  enable  Turing 
machines  to  be  used  as  acceptors  for  a  class  of  sets  much  larger  than 
that  of  those  accepted  by  finite-state  machines,  and  it  is  enough  for 
Turing  machines  to  be  able  to  compute  a  class  of  functions  much 
larger  than  that  of  those  which  can  be  computed  by  finite-state 
machines.  The  sets  that  can  be  recognized  (i.e.,  accepted)  by  Turing 
machines  are  the  recursively  enumerable  sets;  the  functions  that  are 
computable  by  Turing  machines  (henceforth  Tm-computable)  are  the 
partial-recursive  functions. 

-  Another  way  of  stating  Church’s  thesis  or  Turing’s  thesis  is  to  say: 
Any  computable  function  can  be  represented  as  a  partial-recursive 
function.  So  far,  every  general  definition  of  “finitely  describable, 
computable  functions  that  map  one  set  of  strings  into  another  set  of 
strings”  has  been  shown  equivalent  to  the  definition  for  Turing  ma¬ 
chines. 


EXAMPLE  2-4.  A  UNARY  DOUBLER.  We  define  a  simple  Turing 
machine  that  will  produce  a  string  of  I’s  of  length  2m  if  it  is 
given  an  input  tape  containing  a  string  of  I’s  of  length  m.  This 
is  a  computation  that  cannot  be  done  (for  all  m)  by  any 
finite-state  machine.  Let 

Q  =  {?o,  qi>  q2.  93.  q^} 

x,:=^  {b,0,l,A} 

F  =  {qo} 

and  let  the  next-move  function  P  be  defined  by  Table  2-1. 

If  this  machine  is  started  on  a  string  of  the  form 

...bbin  ...  llbbb... 

^ - y - ^ 

m  I’s 
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TABLE  2-1.  A  Unary  Doubler 


Q 

X  Xi 

X 

T 

{l,o,r} 

X  Q 

^0 

1 

0 

R 

Qi 

1 

1 

R 

Qi 

Qi 

0 

0 

R 

b 

A 

R 

<h 

Qi 

A 

A 

R 

<h 

1 

1 

R 

<l2 

Q2 

b 

1 

R 

b 

1 

L 

Qi 

1 

1 

L 

Qi 

A 

A 

L 

Qa 

^0 

0 

0 

L 

Qo 

^0 

b 

A 

0 

Qo 

^0 

A 

A 

0 

Qo 

at  the  leftmost  1,  it  will  eventually  halt  in  state  (here  the 
initial  state  is  also  the  halting  state),  and  its  tape  will  hold  a 
string  of  the  form 

.  . .  Z?M00Q. ..  00^1111  .  . .  lllllfchfc  . 
m  O’s  2m  Ts 

TABLE  2-2.  Operation  of  the 
Unary  Doubler 


1. 

(0) . 

.6611166  .  .  . 

.  6601166  .  .  . 

.  6601166  .  .  . 

2. 

(1) . 

3. 

(1) . 

4. 

(1) . 

.  66011666  .  . 

1 

.  66011X66  .  . 

5. 

(2)  . 

6. 

(3)  . 

.  66011X1666  . 

7. 

(4)  . 

.  66011X1166  . 
1 

.  66011X1166  . 
1 

.  66011X1166  . 

8. 

(4)  . 

9. 

(0)  . 

10. 

(1)  • 

.  66010X1166  . 

11. 

(2)  . 

.  66010X1166  . 

1 

.  66010X1166  . 
i 

.  66010X11666  . 

12. 

(2)  . 

13. 

(2)  . 

14. 

(3)  . 

.  66010X111666 

15. 

(4)  . 

.  66010X111166 
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Table  2-2  shows  the  first  15  steps  of  the  operation  of  the  unary 
doubler.  The  eighth  entry  in  the  table,  for  example,  is 

8.  (4)  , .  ,  bbOllAllbb  . .  . 

which  means  that  in  this  step  the  machine  is  in  state  scanning 
the  A  in  the  string  011/111,  with  an  otherwise  blank  tape.  The 
reader  should  be  able  to  continue  the  table  and  verify  that 
the  machine  will  reach  a  step 

(0)  . . .  hMOOO/lllllllhh  .  .  . 

and  that  no  further  changes  will  take  place  on  the  tape.  (It  is 
a  simple  matter  to  add  extra  states  that  get  rid  of  the  output 
zeros.) 

The  main  thing  to  be  learned  from  Example  2-4  is  that  a  Turing 
machine  typically  manages  to  surpass  the  limitations  of  the  finite  autom¬ 
aton  by  using  “dummy  symbols”  to  store  on  its  tape  information 
about  its  past  operation.  In  the  example,  the  dummy  symbols  are  0 
and  A,  where  0  serves  to  store  the  information  that  a  certain  unit  has 
already  been  doubled,  and  the  appearance  of  two  /4’s  on  the  tape 
represents  the  information  (for  us)  that  the  machine  has  finished  its 
computation.  The  tape  of  a  Turing  machine  is  thus  a  very  significant 
part  of  its  memory. 

Poiycephalic  Turing  Machines 

The  Turing  machine  concept  described  above  is  very  cumbersome 
for  use  on  any  but  the  simplest  problems.  It  is  more  common  to  con¬ 
sider  “poiycephalic”  Turing  machines,  which  possess  several  (n- 
dimensional)  tapes,  each  with  its  own  finite  number  of  tapeheads  (Fig. 
2-6);  this  model  comes  closer  to  the  actual  structure  of  modem 
computers.  The  formalization  for  poiycephalic  machines  is  relatively 
easy  to  construct.  The  relevant  things  to  consider  are: 

1.  The  number  of  tapes  the  machine  uses,  say  n. 

2.  The  dimensionality  of  each  tape.  We  can  let  each  tape  be  an 
m-dimensional  grid  (m  is  variable),  and  have  each  tape 
square  be  specified  by  an  m-tuple  of  integers  (e.g.,  <2,0, 
— 5,1,— 3,6>).  The  dimensionality  of  the  rth  tape  can  be  de¬ 
noted  by  a  function  8(/) . 

3.  The  (finite)  alphabet  used  on  each  tape. 

4.  The  number  r  of  tapeheads  used  by  the  machine  on  each 
tape.  These  can  be  denoted  Tl^  T2\  • . . ,  T/, . . . ,  Tr  for 
the  fth  tape. 
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Figure  2-6.  A  polycephallc  Turing  machine. 

5.  What  to  do  if  two  or  more  tapeheads  are  instructed  to  print 
on  the  same  square  at  the  same  time.  For  each  i,  we  can  use 
a  “dominance  relation”  jR*  that  determines  a  unique  “great¬ 
est”  tapehead  T/  for  any  given  set  of  tapeheads  {T*'*}. 

6.  The  set  of  states  Q  for  the  control;  also  its  initial  state  and 
its  set  of  final  states  F. 

7.  The  next-move  function  P,  which  for  each  tapehead  T/  maps 
Qy.  into  Z*  X  D*  X  Q.  We  let  D*  denote  the  set  of  unit 
direction  vectors  for  the  ith  tape;  e.g.,  <1,0,-1,1,01,0>  is 
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a  unit  direction  vector  for  a  six-dimensional  tape.  We  can 
assume  that  the  tapeheads  for  each  tape  all  start  at  the  origin 
<0,0, . .  .,0>  of  the  tape. 

The  specification  of  (1)  through  (7)  above  then  determines  an 
individual  polycephalic  Turing  machine. 

The  only  advantage  of  polycephalic  Turing  machines  over  simple 
Turing  machines  is  that  they  are  more  efficient  to  use:  They  are  not 
more  general  with  respect  to  the  number  or  the  nature  of  their  uses. 
Any  function  that  can  be  computed  by  a  polychephalic  Tm  can  also  be 
computed  by  a  suitable,  ordinary  Tm. 

Universal  T uring  Machines 

One  of  the  most  surprising  and  important  facts  is  that  some 
Turing  machines  are  capable  of  simulating  the  computations  of  any 
Turing  machine.  These  machines  are  called  “universal  Turing  ma¬ 
chines”;  the  actual  reason  for  their  existence  lies  in  two  facts: 

1.  Any  string  containing  only  a  finite  number  of  different  symbols 
can  be  “coded”  as  a  unary  string,  consisting  only  of  the  symbols  1  and 
the  “blank”  symbol  b. 

2.  Any  Turing  machine  can  be  described  by  a  finite  string  of 
symbols. 

To  show  the  first  fact,  note  that  a  unique  string  of  Ts  can  be 
assigned  to  each  symbol  in  a  set  if  the  set  contains  only  a  finite  number 
of  symbols.  Consequently,  any  string  consisting  only  of  symbols  from 
that  set  can  be  represented  by  a  string  of  the  form  “. .  .  hhl  . .  .  161  . . . 
161  . . .  166  .  . .,”  consisting  of  a  variable  number  of  blocks  of  I’s,  each 
of  variable  length,  each  block  separated  from  the  next  by  a  single  6. 

To  see  the  second  fact,  examine  Table  2-1  and  note  that  the 
total  number  of  symbols  used  in  the  table  is  finite.  Thus,  the  table  itself 
can  be  represented  as  a  (finite)  string  of  quintuples,  each  of  the  form 
{q,x,x,d,q) .  If  one  assigns  a  suitable  unary  coding  to  each  of  the  symbols 
in  Table  2-1,  then  any  quintuple  can  be  represented  uniquely  by  a 
certain  string  of  6’s  and  I’s;  thus  the  table  can  be  represented  by  a 
string  of  6’s  and  I’s  in  which  certain  substrings  stand  for  quintuples. 
The  unary  string  representing  the  quintuples  of  a  given  Turing  ma¬ 
chine  T  will  be  denoted  by  dr  and  called  the  descriptive  string  for  the 
Turing  machine  (not,  in  general,  the  same  thing  as  a  descriptive  string 
for  an  occurrence  of  a  discrete  phenomenon). 

The  actual  construction  of  a  universal  Turing  machine  JJ  is  not 
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very  difficult,  and  the  student  should  either  try  it  for  himself  or  consult 
one  of  the  references  on  automata  theory.  For  our  purposes  here  it  is 
simpler,  and  equally  valid,  to  rely  on  the  following  description  (see 
2—7):  U  works  with  two  tapes,  each  acted  upon  by  a  single  tape- 
head;  the  first  tape  contains  the  descriptive  string  dT  for  the  Turing 
machine  T  that  XJ  is  going  to  simulate,  while  the  second  tape  contains  a 
unary  string  i  representing  (in  the  same  code  as  that  of  dT)  the  input 
to  T;  thus,  no  matter  how  many  states  or  symbols  T  may  use,  the 
machine  U  will  use  only  the  symbols  b  and  1  and  a  few  dummy  symbols 
of  its  own. 


Figure  2-7.  A  universal  Turing  machine. 

To  simulate  T,  U  keeps  a  unary  string  representing  the  “current 
state”  and  the  “current  symbol  scanned”  of  T  on  its  first  tape,  and  it 
uses  this  information  and  dT  to  compute  the  corresponding  actions  it 
should  take  with  respect  to  its  second  tape.  In  other  words,  U  does 
essentially  the  same  thing  a  person  does  when  he  traces  the  operatibn 
of  a  given  Turing  machine  on  a  given  input  tape:  It  merely  keeps  track 
of  where  T  is,  of  what  state  T  is  in,  and  of  what  symbol  T  is  scanning, 
and  it  looks  in  a  table  {dT)  to  find  out  what  actions  T  would  take;  then 
it  implements  those  actions  on  its  own  model  of  T’s  tape. 

A  universal  Turing  machine,  then,  is  one  that  can  be  “given 
a  program''  that  enables  it  to  simulate  a  Turing  machine:  In  fact,  a 
universal  Tm  is  theoretically  equivalent  to  a  general-purpose,  discrete 
(or  digital)  computer,  Sind  the  program  one  gives  a  digital  computer  is 
analogous  to  a  descriptive  string  for  some  Turing  machine  T.  That 
is,  a  computer  program  is  a  descriptive  string  for  a  function  (T)  that 
maps  one  set  of  strings  (the  possible  inputs  to  T)  into  another  set  of 
strings  (the  possible  outputs  from  T).  Computers,  then,  are  mechanisms 
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for  implementing  finitely  describable  processes  of  symbol  manipulation. 

The  fact  that  there  are  universal  machines, “  or  computers,  is  very 
significant  if  we  are  investigating  the  behavior  of  machines  in  general. 
It  enables  us  to  conduct  our  investigations  by  referring  to  the  behavior  of 
a  single  machine  as  it  is  given  various  programs,  rather  than  by  building 
a  new  machine  each  time  we  want  to  observe  a  new  behavior.  In 
particular,  it  makes  feasible  a  search  for  machines  that  simulate  the 
abilities  of  intelligence.  The  work  described  in  the  following  chapters 
would  simply  not  be  possible  without  digital  computers.  The  reader 
who  wishes  to  pursue  the  study  of  digital  computers  is  invited  to  see 
the  books  by  Bartee  (1966),  Bell  and  Newell  (1971),  Chapin,  Mc¬ 
Cormick  (1959),  and  Trakhtenbrot  (1963).  Papers  and  books  relevant 
to  the  history  of  computers  are  those  of  Aiken  (1937),  Babbage 
(1864),  Bernstein  (1964),  Burks  et  al.  (1946),  Bush  (1945),  Gardner 
(1958,  1970,  1971),  S.  Rosen  (1971),  Rosenberg  (1969),  Price 
(1959),  Pylyshyn  (1970),  Shannon  (1948,  1953),  T.  M.  Smith 
(1970),  and  von  Neumann  (1951).  The  books  by  Arbib  (1964,  1968, 
1969,  1972)  and  Minsky  (1967,  1969)  are  excellent  introductions  to 
the  automata-theoretic  nature  of  computers. 


LIMITS  TO  COMPUTATIONAL  ABILITY 

At  this  point  the  major  purpose  of  this  chapter  has  been  satisfied, 
which  was  to  show  how  it  is  that  one  can  investigate  finitely  describable 
phenomena  in  general  (and,  especially,  hope  to  simulate  intelligence)  by 
using  computers. 

It  remains  to  complete  the  survey  of  those  general  limitations  that 
can  be  placed  upon  the  success  of  artificial  intelligence  research.  We 
have  already  seen  one  such  limitation,  which  is  that  the  results  of  ai 
research  must  always  be  finitely  describable:  If  natural  intelligence  is 
not  av  finitely  describable  phenonienon,  then  the  best  that  ai  research 
can  do  is  to  simulate  some,  but  riot  all,  of  its  abilities. 

There  is  no  scientific  evidence  that  natural  intelligence  is  not 
finitely  describable — indeed,  we  have  tried  to  show  that  there  cannot  be 
such  evidence.  On  the  other  hand,  there  is  some  scientific  evidence  that 
natural  intelligence  is  finitely  describable;  namely,  the  evidence  that  the 
brains  of  certain  animals  do  “possess  intelligence,”  plus  the  fact  that 
these  brains  each  contain  finite  numbers  of  cells.  However,  the  evidence 
concerning  the  actual  function  and  nature  of  brain  cells  is  far  from 


13  Not  all  Turing  machines  are  universal. 
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final,  and  the  exact  way  in  which  the  intelligence  of  a  brain  is  dependent 
upon  its  cells  is  still  unknown  (see  Chapter  1).  The  most  one  can  say  is 
that  the  finite  describability  of  true  intelligence  is  likely  but  not  proved. 

Another  general  limitation  concerning  the  properties  of  artificial 
intelligences  can  be  derived :  It  can  be  shown  that  there  are  certain 
unsolvable  problems,  which  cannot  be  solved  by  any  machine,  that  is, 
by  any  finitely  describable  process;  artificial  intelligence  research,  then, 
can  never  produce  a  machine  intelligent  enough  to  solve  one  of  these 
problems. 

Before  discussing  one  such  unsolvable  problem — the  famous  Halt¬ 
ing  Problem,  first  shown  to  be  unsolvable  by  Turing — it  is  wise  to  note 
that  there  is  probably  no  way  any  natural  intelligence  can  be  shown 
scientifically  to  be  able  to  solve  one  of  these  problems.  Certainly,  unless 
Turing’s  thesis  is  false,  no  natural  intelligence  could  ever  give  a  finite 
description  of  a  way  to  solve  one  of  these  problems. 

The  Halting  Problem  can  be  stated  as  follows:  For  any  Turing 
machine  T,  given  any  input  tape  i,  tell  whether  or  not  T  will  eventually 
halt  its  computation.  By  halting  is  meant  that  T  enters  one  of  its  final 
states  qj  G  F,  and  prints  a  certain  symbol  (say,  the  halt  symbol  H)  on  a 
square  of  its  tape;  also,  whenever  H  occurs  in  a  quintuple  in  the  next- 
move  function  for  T,  the  quintuple  is  always  of  the  form 
(In  particular  we  have  qj,H,H,0,q^.)  Thus,  an  outside  observer,  given 
a  description  of  T,  knows  whenever  he  sees  an  H  appear  on  the  tape 
of  the  Turing  machine  T  that  T  is  finished  with  its  computation,  and 
will  do  no  more  (significant)  manipulation  of  its  tape. 

It  can  be  shown  that  there  is  no  Turing  machine  capable  of  solving 
the  Halting  Problem.  That  is,  we  can  show  that  there  does  not  exist 
a  Turing  machine  D  which,  given  a  description  (It  for  any  Turing 
machine  T,  and  given  any  input  tape  i,  will  always  compute  in  a  finite 
time  whether  or  not  T  would  eventually  halt  its  computation  if  it  were 
given  the  input  tape  L  There  are  many  ways  to  go  about  proving  this 
fact:  One  relatively  simple  way  involves  showing  that  if  there  were 
such  a  machine  D,  then  one  could  use  it  as  part  of  a  larger  machine 
(say,  E)  such  that,  given  dE  and  an  arbitrary  D  would  not  be  able  to 
compute  whether  E  would  ever  halt.  (See  Minsky,  1967,  for  an  exposi¬ 
tion  of  this  approach. ) 

Given,  then,  that  there  are  problems  no  artificial  intelligence  can 
solve,  it  is  natural  to  ask  whether  an  artificial  intelligence  can  be  con¬ 
structed  so  as  to  recognize  these  problems  whenever  they  arise  in  the 
course  of  its  operation,  prove  that  they  are  unsolvable,  and  stop  working 
on  them.  In  fact,  it  can  be  shown  that  no  Turing  machine  (thus,  no 
artificial  intelligence)  is  capable  of  recognizing  all  unsolvable  problems. 
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For  any  mathematically  describable  problem-solving  device  there  exists 
at  least  one  problem  that  the  device  cannot  solve,  and  cannot  recognize 
to  be  unsolvable,  provided  the  device  is  consistent  (incapable  of 
producing  contradictory  answers  if  given  noncontradictory  premises) 
and  capable  of  doing  simple  arithmetic  (addition  and  multiplication). 
This  should  not  be  taken  to  mean  that  if  such  a  machine  is  confronted 
with  such  an  unsolvable  problem,  it  will  never  stop  working  on  the 
problem,  since  the  machine  could  easily  '  be  designed  not  to  work  on 
any  problem  past  a  certain  time  limit.  Also,  this  limitation  does  not 
apply  if  the  machine  is  allowed  to  be  inconsistent — but,  of  course,  with 
an  inconsistent  machine  one  cannot  be  sure  that  the  answer  the  ma¬ 
chine  produces  is  correct.  Whether  this  is  true  of  human  beings, 
whether  there  are  problems  that  natural  intelligence  can  never  solve, 
and  can  never  prove  to  be  unsolvable,  is  an  open  question:  It  can 
be  answered  only  in  a  scientific-mathematical  way  if  it  is  shown  that 
natural  intelligence  can  be  mathematically  described — if  it  can  be 
mathematically  described,  then  problems  of  this  sort  probably  exist. 

These  limitations  on  the  generality  of  artificial  intelligence,  which 
have  to  do  with  the  capacities  of  mathematical  description  and  the 
existence  of  mechanically  unsolvable  problems,  are  both  of  a  very 
theoretical  yet  vague  nature.  They  really  say  nothing  very  concrete  about 
the  real-world  capabilities  of  machines  (or  of  people).  We  would  do 
well,  therefore,  to  investigate  more  specific  limits  on  the  computational 
abilities  of  machines.  The  remainder  of  this  chapter  is  devoted  to  a 
discussion  of  the  physical  boundaries  of  the  computational  abilities  of 
machines,  and  to  establishing  certain  “conventions”  regarding  these 
boundaries,  which  are  referred  to  (for  illustrative  purposes  only — the 
boundaries  are  not  exact)  throughout  the  rest  of  this  book. 

To  establish  these  conventions,  note  that  there  are  three  basic 
ways  in  which  the  description  of  Turing  machines  has,  so  far,  been 
unrealistic^^: 

1 .  No  real-world  Turing  machine  can  actually  have  an  infinite 
tape,  or  even  a  truly  “potentially  infinite”  tape;  there  are 
limits  to  how  much  “information”  can  be  stored  in  a  com¬ 
puter  memory. 

2.  Any  real-world  Turing  machine  must  conduct  each  of  its 
actions  (reading  the  tape,  evaluating  the  next-move  function, 
printing  the  tape,  moving  the  tape)  in  a  finite,  nonzero  time; 
there  are  limits  to  how  fast  a  computer  can  operate. 

Ignore  the  fact  that  modern  computers  operate  on  a  higher  “level”  than 
Turing  machines  (see  the  discussion  of  machine  languages  in  Chapter  7). 
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3.  Any  real-world  machine  must  conduct  each  of  its  actions 
with  a  nonzero  “probability  of  error.”  Thus,  in  reading  the 
tape  there  must  be  a  nonzero  probability  that  the  Tm  control 
will  be  incorrectly  informed  as  to  which  symbol  is  actually 
on  the  tape  square  being  examined  by  the  tapehead.  Simi¬ 
larly,  there  must  be  a  nonzero  probability  that  the  next-move 
function  will  be  misevaluated,  etc.  Thus,  there  are  limits  to 
the  accuracy  with  which  a  given  computer  can  operate. 

Let  us  stress  that,  essentially,  these  same  physical  limitations 
apply  to  all  real-world  computers,  not  just  to  Turing  machines. 

The  third  limitation  of  machines  means  that  real-world  computers 
are  actually  probabilistic  (perhaps  nondeterministic;  see  Hopcroft  and 
Ullman,  1969,  and  Manna,  1970b).  In  effect,  any  real-world  machine 
is  capable  of  errors  in  any  computation  it  makes  (so,  in  a  sense,  ma¬ 
chines  are  inherently  “inconsistent”).  However,  the  inaccuracy  of 
machines  may  often  be  minimized;  in  particular,  it  is  often  possible  to 
build  machines  that  are  more  “reliable”  than  their  components,  in 
terms  of  the  accuracy  with  which  they  compute  their  respective  func¬ 
tions.  (The  classic  paper  on  this  subject  is  that  of  von  Neumann, 
1956.)  Although  little  will  be  said  hereafter  about  the  probabilistic 
nature  of  machines,  a  reasonable  convention  for  modern-day  computers 
is  to  assume  that  such  a  machine  will  normally  make  less  than  one 
error  per  billion  read-evaluate-print-move  cycles. 

To  discuss  the  memory-size  limitation  of  computers,  a  brief  but 
quantitative  definition  of  the  word  “information”  is  needed.  What  does 
it  mean  to  say  one  computer  memory  will  hold  more  information  than 
another?  (Throughout  this  discussion  we  will  be  concerned  only  with 
the  memory  that  corresponds  to  the  tape  of  the  Tm,  not  with  the 
memory  that  corresponds  to  its  finite-state  control.)  The  qualitative 
answer  is  fairly  simple:  The  amount  of  information  a  tape  (memory) 
can  hold  is  dependent  on  the  number  of  squares  that  make  up  the  tape 
and  on  the  number  of  symbols  that  may  be  printed  on  each  square/® 
Since  the  simplest  tape  is  one  for  which  each  square  may  have  printed 
on  it  only  one  of  two  symbols  (“blank”  and  “1”;  “0”  and  “1”;  etc.), 

^^This  is  essentially  the  Shannon-Weaver  (1949)  concept  of  “information.” 
A  more  intuitive  approach  to  information  would  include  some  way  of  describing 
the  probable  causes,  effects,  and  denotations  of  a  given  string  of  symbols.  This 
is  discussed  more  thoroughly  in  Chapter  7,  but  we  may  note  here  that  there  is 
still  no  clearly  satisfactory  formalization  for  the  intuitive  concept  of  information. 
Also,  it  is  common  to  omit  the  “ceiling-function”  and  to  allow  information  to 
come  in  noninteger  quantities  of  bits. 
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it  is  customary  to  take  this  kind  of  tape  as  a  standard.  The  number  of 
squares  that  make  up  such  a  tape  is  referred  to  as  the  number  of  bits 
(binary  digits)  of  information  that  it  can  hold.  To  find  the  number  of 
bits  of  information  that  can  be  held  by  a  given  nonstandard  tape,  we 
must  figure  out  how  large  a  standard  tape  must  be  in  order  to  store  as 
many  different  strings  of  symbols  as  can  be  held  by  the  nonstandard 
tape. 

This  is  easily  done.  (Remember,  any  physical,  real-world  tape 
can  be  made  up  of  only  a  finite  number  of  squares.)  Suppose  each  of 
the  squares  of  the  nonstandard  tape  is  numbered  successively:  1,2,3,  . . 
n.  Let  the  number  of  symbols  that  can  be  printed  on  square  i  be 
s(i) — -again,  only  one  symbol  may  be  printed  on  a  square  at  a  given 
moment.  Then  the  product 

==  5‘(1)5'(2)6‘(3)  . .  .  5'(n) 

is  the  total  number  of  different  strings  of  symbols  that  can  be  stored  on 
the  given  nonstandard  tape.  If  jc  is  a  real  number,  we  define  the  ceiling 
function  (see  Knuth,  1969a)  of  x  to  be  the  least  integer  that  is  greater 
than  or  equal  to  x.  Denote  the  ceiling  function  of  x  by  the  expression 
TjcH.  Thus,  r6.5"l  7,  =  4,  r-2.3“l  =  r -2n,  ron  n.  o,  etc.  The 

reader  may  easily  convince  himself  that  the  smallest  standard  tape 
that  can  hold  as  many  different  strings  of  symbols  as  those  held  by  the 
nonstandard  tape  must  have 

nog2  si 

squares.  We  may  therefore  take  this  to  be  the  amount  of  information 
(in  bits)  that  can  be  held  by  the  nonstandard  tape.^® 

Modern  computing  systems  make  use  of  many  different  types  of 
memory  systems,  each  with  its  own  characteristics.  Some  currently  ac¬ 
curate  conventions  for  the  storage  capabilities  of  these  systems  are: 
“core”  memories  may  hold  on  the  order  of  10^  bits;  “disk”  memories 
may  hold  on  the  order  of  10®  bits;  magnetic  tape  memories  may  hold  on 
the  order  of  10®  bits;  optical  (laser)  memory  systems  currently  in 
development  may  hold  between  10^®  and  10^^  bits  (see  Damron  et  al., 
1968;  R.  P.  Hunt  et  al,  1970;  Lohman  et  al.,  1971).  It  should  be 
noted  that  the  access  time  necessary  for  a  computer  to  determine  what 
symbol  is  stored  at  a  given  position  (“square”)  in  a  memory  will,  in 

Of  course  this  notion  of  information  does  not  really  depend  on  whether 
the  “tape”  is  actually  made  up  of  squares,  on  whether  it  is  one-,  two-,  or  n- 
.  dimensional,  or  on  whether  “symbols”  are  “printed”  or  “stored”  in  some  other 
manner  in  the  “squares”  of  the  tape,  etc. 
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general,  increase  with  the  size'  of  the  memory.  Thus,  the  access  time  for 
a  core  memory  is  generally  on  the  order  of  lO"*^  second,  whereas  for 
an  optical  memory  it  is  generally  on  the  order  of  a  second  (see  Chapter 
8,  “Hierarchical  Systems”). 

Probably  the  conventions  used  most  often  throughout  this  book 
are  those  pertaining  to  limitation  2,  that  is,  the  speed  with  which  a 
computer  can  operate.  The  basic  actions  performed  by  a  modern 
computer  are,  in  analogy  to  those  performed  by  a  Turing  machine, 
“read  locatiori(s)  in  memory,”  “perform  logical  or  arithmetical  opera¬ 
tions,”  “store  result(s)  in  memory,”  “access  new  location(s)  in  mem¬ 
ory.”  The  performance  of  this  sequence  of  operations  corresponds  to  a 
cycle  of  the  operation  of  the  computer;  in  general,  for  each  cycle  of 
operation,  the  computer  processes  one  machine  instruction  (i.e.,  eval¬ 
uates  one  instance  of  the  next-move  function).  It  should  be  emphasized 
that,  for  most  of  the  symbol-manipulation  procedures  in  which  we  are 
interested,  a  typical  computer  will  usually  have  to  process  several  ma¬ 
chine  instructions  to  complete  each  step  of  the  procedure  (how  many 
depends  upon  the  program,  the  collection  of  machine  instructions,  that  is 
being  used  to  describe  the  procedure).  We  shall  have  occasion  to  make 
use  of  several  different  conventions  for  the  speed  with  which  the  steps  of 
a  procedure  can  be  carried  out  by  a  machine — each  convention  we  use 
will  pertain  to  a  different  type  of  machine.  These  machines,  and  the  cor¬ 
responding  conventions,  will  be  referred  to  as  follows : 

conventional  1  microsecond/step 

attainable  1  nanosecond/step 

theoretical  serial  10"^^  nanosecond/step 

theoretical  parallel  10"®®  nanosecond/step  or  years/steps 

Again,  these  are  rough  estimates.  Their  accuracy  and  meaning  will  now 
be  discussed. 

Conventional.  Modem  computers  process  about  10  million  in¬ 
structions  per  second.  It  is  Estimated  that,  with  optimal  programming, 
the  average  step  involved  in  the  type  of  nonnumerical  computations  we 
are  investigating  (those  that  “simulate  intelligence”)  might  require  ten 
machine  instructions;  probably  this  is  conservative.  For  example, 
generating  a  successor  to  a  chessboard  configuration  might,  with  ex¬ 
tremely  good  machine-language  programming,  be  done  in  1  micro¬ 
second.  Using  the  “conventional”  time  estimate  will  give  the  student  a 
rough  indication  of  the  best  speed  he  can  expect  a  current  computer  to 
achieve  in  performing  a  given  procedure. 

Attainable.  Some  integrated-circuit  chips  have  been  synthesized 
which  are  small  computers  and  memories.  These  chips  typically  have 
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operation  and  access  times  on  the  order  of  nanoseconds.  Using  circuitry 
and  computer  chips  specifically  designed  for  a  given  procedure,  it  is 
conceivable  that  the  steps  of  that  procedure  might  be  performed  at  the 
rate  of  1  nanosecond/step.  Should  the  time  required  for  complete  execu¬ 
tion  of  a  procedure  be  very  large,  using  the  “attainable”  estimate,  the 
student  may  conclude  that  current  technology  is  not  capable  of  build¬ 
ing  a  machine  to  perform  the  procedure.  (However,  it  should  be  noted 
that  coherent  optical  systems  may  eventually  be  used  to  perform  logical 
operations  at  rates  on  the  order  of  one  picosecond  (10""^^  second) 
per  operation  (see  Culver  and  Mehran,  1971). 

Theoretical  Serial  Bledsoe  (1961)  used  quantum  theoretical  con¬ 
siderations  to  derive  the  minimum  access  time  of  a  serial  digital  com¬ 
puter  (in  which  all  information  is  passed  through  a  central  processing 
unit)  with  a  density  less  than  or  equal  to  60  gm/cm^  He  obtained  the 
figure  10'"^  second  =  lO”^"  nanosecond.  Therefore  10“^^  nanosecond/ 
step  is  taken  as  the  best  speed  with  which  a  serial  computer  could 
perform  the  steps  of  a  given  procedure.  It  seems  likely  that  this  speed 
of  computation  is  completely  beyond  the  bounds  of  any  anticipated 
technology. 

Theoretical  Parallel.  Bremermann  (1967)  computed  the  maximum 
rate  at  which  information  can  be  processed  in  a  universe  of  10^^  protons, 
and  he  obtained  7  x  10^^^  bits/year.  This  estimate,  in  the  form  of  10"^® 
nanosecond/step  or  10"^''^  year/step,  is  used  as  the  maximum  speed 
with  which  the  steps  of  any  given  computational  procedure  can  con¬ 
ceivably  be  performed.  It  is  useful  simply  as  a  “clincher”  to  establish 
whether  a  procedure  is  completely  beyond  the  bonds  of  computation. 

There  are  real-world  problems  for  which  the  only  procedures  we 
can  describe  that  would  yield  exact  solutions  cannot  be  carried  out: 
The  performance  of  these  procedures  is  beyond  even  the  “theoretical 
parallel’’  bound  to  computational  ability.  (One  such  problem  is  the 
game  of  go;  see  Chapter  4.)  However,  it  should  be  emphasized  that 
for  most  problems  there  are  several  procedures  for  arriving  at  (perhaps 
partial)  solutions,  and  it  is  possible  for  some  to  be  within  bounds  and 
others  to  be  out.  Similarly,  it  is  usually  possible  to  describe  a  given 
procedure  with  several  different  programs,  some  of  which  may  be  more 
quickly  executed  by  a  given  machine  than  others. 

Because  so  little  is  known  about  the  functioning  of  the  human 
brain,  it  is  difficult  to  compare  its  physical  limitations  with  those  of 
computers.  The  consensus  seems  to  be  that  the  brain  has  a  larger 
memory  than  that  of  the  computer  but  that  it  performs  its  logical 
operations  (whatever  they  are)  much  more  slowly  (on  the  order  of 
milliseconds/operation).  The  slowness  of  the  brain’s  operation  seems 
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to  be  relatively  unimportant  if  we  consider  the  complexity  of  its  struc¬ 
ture  and  the  fact  that  it  is  highly  parallel;  these  attributes  probably  ac¬ 
count  for  its  evident  ability  to  perform  extremely  complex  logical 
operations  at  about  the  same  speed  with  which  it  performs  more  simple 
reflexes. 


SUMMARY 

We  have  seen  that  there  are  limits  to  the  things  that  computers 
can  be  used  to  simulate,  to  the  problems  they  can  be  used  to  solve,  and 
to  the  procedures  they  can  perform.  However,  our  knowledge  of  these 
limits  and  of  natural  intelligence  is  not  sufficient  to  determine  whether 
the  attainment  of  a  general  artificial  intelligence  is  within  the  bounds  of 
computational  ability.  AI  researchers  still  do  not  have  enough  evidence 
to  decide  whether  machines  can  be  made  as  intelligent  as  human  beings. 


NOTES 

2-1,  In  fact,  we  have  here  described  what  might  be  called  simple  mathe¬ 
matical  theories.  We  may  define  a  general  mathematical  theory  to  be  such 
that  its  three  sets  are  finitely  describable.  At  any  rate,  the  object  described 
by  the  theory  is  still  finitely  describable. 

2—2.  The  proposition  that  if  a  thing  is  finitely  describable  it  is  therefore 
mathematically  describable  is  generally  taken  as  a  postulate  of  the  phi¬ 
losophy  of  mathematics,  since  it  has  not  been  proved  mathematically. 
Mathematics  seems  incapable  of  supplying  or  handling  a  nonmathematical 
definition  for  the  concept  of  “finitely  describable.”  The  evidence  so  far  is 
clear,  however,  that  all  mathematical  ways  of  formalizing  the  concept  of 
“finitely  describable”  are  equivalent, 

Erom  the  mathematician’s  point  of  view,  the  thing  is  often  identified 
with  the  set  of  sentences  describing  it.  One  could  say  in  this  sense  that  the 
natural  numbers  do  not  exist  separately  from  the  axioms  that  generate  the 
set  of  sentences  describing  them.  The  notion  of  mathematical  description  is 
an  effective  notion,  something  like  “approximation”:  One  can  derive  as 
many  of  the  truths  about  the  thing  described  as  one  likes,  though  one  can¬ 
not  necessarily  derive  all  such  truths,  in  a  finite  time. 

2—4.  Actually,  this  is  not  quite  true.  One  can  extend  the  arguments  of 
Chaitin  (1966,  1969)  to  prove  that  there  exists  a  finite  set  A  with,  say, 
10  elements,  such  that  any  finite  description  of  A  requires  at  least  as 
many  elements  (production-rules;  see  Chapter  7)  as  there  are  in  A.  That 
is,  the  smallest  finite  description  of  A  is  the  enumeration  of  the  elements  in 
A.  However,  the  actual  enumeration  of  the  10^®®  elements  in  A  is  physically 
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impossible.  That  is,  the  set  A  is  finite,  but  is  “practically  infinite,  practically 
nondescribable 

The  proof  for  the  existence  of  such  a  set  would,  of  necessity,  be  non¬ 
constructive.  Existence  proofs  of  this  type  are  not  considered  valid  by  some 
mathematicians  (note  2-5). 

2-5.  (A  set  is  countable  if  its  elements  may  be  put  into  one-to-one  cor¬ 
respondence  with  the  natural  numbers  1,2,3,  ....  The  set  of  sentences 
derivable  within  a  given  mathematical  theory  must  be  countable,  since  for 
each  n  there  can  be  only  a  finite  number  of  sentences  of  length  n  or  less). 

This  conclusion  was  first  drawn  from  the  work  of  Georg  Cantor.  Nat¬ 
urally,  it  aroused  much  controversy,  and  there  are  many  mathematicians 
today  who  disagree  with  Cantorism  (see  Kac  and  Ulam,  1968,  pp.  12— 14). 
In  particular,  there  is  no  unanimous  viewpoint  among  mathematicians  as 
to  the  proper  rules  for  reasoning  about  ‘‘infinity”  or  even,  for  that  matter, 
as  to  the  existence  of  infinite  things  (see  Benacerraf  and  Putnam,  1964). 
Thus,  the  argument  concerning  the  existence  of  mathematically  nonde- 
scribable  numbers  is  not  a  proof,  especially  if  one  does  not  grant  the  a 
priori  validity  of  the  infinity  concept.  Hilbert  (in  Benacerraf  and  Putnam, 
1964,  p.  136)  argued  that  the  results  of  scientific  investigation  have  given 
no  evidence  for  the  existence  of  infinite  things.  The  viewpoint  in  this  book 
is  that  scientific  investigation  has  given  no  incontrovertible  evidence  con¬ 
cerning  either  the  existence  or  nonexistence  of  infinite  things. 

It  should  be  pointed  out  that  some  scientists  have  disputed  the  com¬ 
pleteness  of  mechanistic  reasoning,  using  quantum  theoretical  arguments 
(see,  e.g.,  Elsasser,  1969). 

2-6.  Certain  aspects  of  the  universe  are,  according  to  current  scientific 
theories,  described  as  being  finite.  According  to  relativity  theory,  there  is  a 
maximum  possible  velocity,  that  of  light,  although  certain  phase  velocities 
can  be  greater.  Albert  Einstein  suspected  that  the  spatial  size  of  the  universe 
might  be  bounded,  and  estimated  a  figure  for  its  radius. 

2-7.  The  definition  of  discrete  phenomena  does  not  require  that  an  oc¬ 
currence  have  a  “first”  event.  Even  so,‘  it  is  possible  to  make  a  descriptive 
string,  with  a  beginning,  for  an  occurrence  with  neither  a  first  nor  a  last  event. 
(How?) 

2—8.  Of  course  no  one  can  observe  precisely  that  an  event  starts  imme¬ 
diately  after”  some  time  t;  what  this  means  is  that  there  are  different  step 
functions  that  describe  occurrences  which  appear  to  be  the  same.  (Exam¬ 
ples?)  This  also  applies,  of  course,  to  “immediately  before,”  and  “during 
the  instant.” 

2-9.  This  seems,  incidentally,  not  to  be  the  case  with  most  nondiscrete 
functions.  For  these,  the  best  one  can  usually  do  is  to  approximate  the 
value  at  a  given  point  to  an  arbitrary  closeness  in  a  finite  number  of  steps. 
Even  in  the  case  of  Fig.  2-2,  “the  weight  on  a  spring,”  we  really  deal  with 
a  type  of  approximation:  The  differential  equation  that  describes  the  class  of 
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all  possible  occurrence  functions  does  have  exactly  one  solution  for  any 
given  assipment  of  values  to  its  variables  (m,a,k),  but  in  order  to  evaluate 
that  solution  and  to  see  where  the  weight  will  be  at  a  certain  time  t,  we 
usually  have  to  compute  certain  functions  (sine,  cosine,  etc.)  that  yield 
approximations.  Typically,  a  finite  description  of  an  occurrence  of  a  non- 
discrete  phenomenon  will  give  exact  information  at  a  finite  number  of  points 
(values  of  f)  and  information  that  is  an  arbitrarily  exact  approximation  at  an 
infinite  number  of  points.  Something  of  the  reverse  holds  for  discrete  phe¬ 
nomena:  A  finite  description  will  often  give  exact  information  about  an 
infinite  number  of  values  of  t,  of  which  at  most  only  a  finite  number  are  “ap¬ 
proximate”;  for  example,  2,^-1-, 4-I-. 

2-10.  The  observant  reader  may  object  that  surely  one  cannot  represent 
phenomena  that  have  a  nondenumerable  number  of  occurrences  by  de¬ 
scriptions  which  yield  a  denumerable  number  of  occurrences.  To  (partially) 
answer  this  objection,  consider  Example  2-1,  “the  weight  on  a  spring”:  This 
phenomenon  may  presumably  have  a  nondenumerable  number  of  occur¬ 
rences.  However,  the  set  of  occurrences  one  can  actually  compute,  using  its 
description  (the  differential  equation),  is  denumerable,  for  three  reasons: 

a.  Each  computable  occurrence  is  specified  by  listing  the  values  for 
the  variables  m,  a,  k,  and  the  accuracy  with  which  one  wishes  to 
evaluate  the  equation. 

b.  Each  of  these  values  must  be  finitely  described,  and  the  finitely 
describable  numbers  are  a  countable  set. 

c.  The  countable  product  of  countable  sets  is  countable. 

Thus,  although  the  description  of  the  phenomenon  applies  in  an  “ideal” 
sense  to  an  uncountable  number  of  occurrences,  it  actually  describes  only  a 
countable  set.  ^ 

2-11-  One  of  the  subplots  of  Kurt  Vonnegut’s  novel.  The  Sirens  of 
Titan,  is  relevant:  The  hero  is  part  of  an  army  trapped  on  Mars.  Most  of 
the  soldiers  in  the  army  have  radio  receivers  implanted  in  their  brains  and 
are  remote-controlled  by  a  person  who  has  decided  (for  reasons  extraneous 
to  this  discussion)  to  have  them  invade  Earth.  The  hero  manages  to  dis¬ 
cover  what  is  happening,  despite  the  fact  that  he  has  a  radio  receiver  im¬ 
planted  in  his  own  brain,  and  he  writes  a  letter  to  himself  describing  every- 
thmg  he  knows  about  the  invasion.  After  hiding  his  letter,  his  dislike  for 
the  army  is  found  out.  Surgical  officers  in  the  army  erase  a  great  deal  of 
his  memory,  but  after  he  returns  to  duty  he  discovers  his  letter.  Reading  it, 
he  is  able  to  replenish  his  memory  and  begin  again.  This  cycle  repeats  several 
times. 

EXERCISES 

2-1  (a)  Find  a  finite  description  for  the  set  of  points  that  are  the  intersections 

of  the  24  Archimedean  spirals,  having  equations  of  the  form 

r  =  ±(fl+(W6)) 
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where  fc  =  0,1,2, .  1 .  ,11.  (b)  Find  a  finite  description  for  the  set  of  points  formed 
by  the  analogous  intersection  of  24  exponential  spirals,  with  equations  of  the 
form 

r  =  ±(e**(0  +  (k^/6)) ) 

2-2.  Construct  a  next-move  function  for  a  “unary  multiplier,”  which,  given  an 
input  string 

...bbhn..\b\\l..\bbb,,. 

m  n 

consisting  of  a  string  ot  m  I’s  followed  by  a  string  of  n  Ts,  produces  on  its  tape 
an  output  string  containing  mn  Ts. 

2-5.  (a)  Show  that  any  Turing  machine  can  be  represented  by  a  natural 

number  (an  integer  greater  than  zero)*  (b)  Give  a  finite  description  for  a  function 
/  mapping  the  natural  numbers  into  [0,1],  such  that  /(«)  cannot  be  computed 
for  any  n  by  any  Turing  machine. 

h 


*2 

2-4.  Consider  a  simple  “polycephalic”  Turing  machine  which  has  two  tapes, 
h  and  h,  each  of  which  is  filled  completely  by  zero’s  except  for  a  single  block  of 
I’s.  Let  the  blocks  of  Ts  on  the  two  tapes  be  right-justified,  as  indicated  above. 
Find  the  simplest  possible  next-move  function  that  will  enable  an  outside  ob¬ 
server  to  determine  whether  or  not  the  number  of  I’s  on  tape  ii  is  greater  than 
or  equal  to  the  number  of  I’s  on  tape  iV,  assuming  that  he  cannot  observe  the 
state  q  of  the  Turing  machine. 

2—5.  In  1962  there  were  on  this  planet  about  55,000  scientific  journals  publishing 
about  1,200,000  articles  per  year;  there  were  also  60,000  scientific  books  and 
100,000  other  research  reports  issued  per  year  (in  the  United  States,  scientific 
and  technical  publications  have  doubled  in  bulk  approximately  every  20  years 
since  1800).  Estimate  the  size,  in  bits,  of  a  computer  memory  capable  of  storing 
(a)  all  scientific  publications  produced  in  1962,  and  (b)  all  scientific  publications 
produced  as  of  the  present.  Assume 
30  pages  per  article 
300  pages  per  book 
100  pages  per  research  report 
60  lines  of  print  per  page 
70  symbols  per  line 

and  assume  each  symbol  can  be  any  of  128  different  characters.  How  fast  must 
one  add  to  such  a  memory,  to  keep  it  up  to  date? 


Dragon  maze.  (Courtesy  of  D.  Ingalls,  Xerox  Palo  Alto  Research  Center.) 
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PROBLEM  SOLVING 


INTRODUCTION 

The  opening  sections  of  this  chapter  present  a  brief  overview  of 
the  directions  currently  being  taken  by  artificial-intelligence  (ai)  re¬ 
search  and  of  the  subjects  that  will  be  covered  in  subsequent  chapters. 
The  third  section  of  this  chapter  describes  some  of  the  ways  that  ai 
researchers  have  formalized  the  concept  of  “problem.”  In  the  succeed¬ 
ing  sections  general  problem  solvers  and  reasoning  programs,  state- 
space  problems,  and  heuristic  search  theory  are  discussed.  Planning, 
learning,  and  reasoning  by  analogy  are  then  introduced  briefly.  The 
final  section  is  concerned  with  models,  the  “problem  of  problem  repre¬ 
sentation,”  and  the  levels  of  competence  that  have  been  attained  by 
artificial  intelligences. 


PARADIGMS 
General  Approaches 

Speculations  on  the  possibility  of  a  search  for  mechanical  intel¬ 
ligence  were  originally  put  forth  by  several  individuals— including  Alan 
Turing,  John  von  Neumann'  and  Norbeft  Wiener — during  the  years 
1943-1950;  however,  it  was  not  until  electronic  digital  computers  be¬ 
came  generally  available  in  the  early  1950s  that  experimental  research 
in  artificial  intelligence  could  begin.  Rapid  progress  in  ai  research  did 
not  occur  until  “symbolic  processing  languages,”  such  as  ipl  and  lisp, 
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were  developed  (note  3-1).  To  date  there  have  been  several  thousand 
papers  published  on  the  subject  of  artificial  intelligence.  However,  ai 
research  is  still  in  its  embryonic  state,  and  we  cannot  yet  decide  what  its 
final  form  will  be.  Thus,  this  book  must  serve  both  as  an  introduction 
to  a  vast  body  of  literature  and  a  commentary  on  what  appear  to  be  the 
central  topics  discussed  in  that  literature. 

Kuhn  (1962)  discussed  the  importance  of  paradigms  in  the 
development  of  scientific  investigations.  (A  paradigm  is  a  general 
model  of  something  that  is  found  to  be  useful  for  investigating  that 
thing.)  AI  researchers  have  developed  many  paradigms  for  artificial 
intelligence.  Chapter  1  has  already  discussed  one,  which  is  represented 
by  Turing  s  test:  Artificial  intelligence  research  is  concerned  with  build- 
ing  machines  that  can  perform  tasks  which  people  would  ordinarily  say 
require  the  “intellectual  abilities”  of  a  human  being. 

Environments 

Another  way  of  viewing  ai  research  is  to  see  it  as  an  effort  to  design 
machines  that  are  capable  of  existing  on  their  own  in  environments 
produced  by  the  real  world.  An  “environment  produced  by  the  real 
>vorld”  (or  a  real-world  environment)  is  not  necessarily  our  own  en¬ 
vironment.  A  mechanical  intelligence  might,  for  instance,  operate  in  an 
environment  consisting  of  “all  published  scientific  works.”  Intuitively, 
an  environment  produced  by  the  real  world  is  always  changing  and  does 
not  have  a  kno\vn,  complete  description  or  prediction.  We  expect  such 
an  environment  to  exhibit  regularities,  or  “patterns,”  and  we  expect  a 
machine  that  operates  in  such  an  environment  to  encounter  “problems.” 
A  machine  operating  successfully  in  a  real-world  environment  will 
have  to  develop  and  represent  internally  its  own  “knowledge”  of  that 
environment.  It  may  have  to  discover  largely  on  its  own  the  problems 
it  needs  to  solve  and  the  patterns  it  needs  to  recognize.  If  we  design  the 
environment  ourselves,  then  many  of  these  problems  and  patterns  may 
be  presented  to  the  machine  automatically  (as  with  question-answering 
and  fact-retrieving  machines;  see  Chapter  7).  Even  if  we  do  not  design 
the  environment,  we  may  still  know  enough  about  it  to  give  the  machine 
automatic  procedures  for  locating  relevant  problems  and  patterns.  At 
any  rate,  the  machine  will  have  to  be  able  to  solve  the  problems  and  to 
perceive  the  patterns  that  it  encounters. 

Throughout  this  chapter  we  shall  have  much  to  say  about  the 
general  nature  of  machines  that  are  capable  of  existing  in  real-world 
environments.  It  is  convenient  to  say  that  a  machine  which  is  capable 
of  operating  successfully  in  a  real-world  environment  displays  an 
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aptitude  for  that  environment.  Also,  reference  is  often  inade  to  environ¬ 
ments  simply  as  “problem  domains”  or  “problem  areas.” 

Aptitudes 

A  closely  related  paradigm  for  ai  research  is  to  see  it  as  being 
concerned  with  frameworks  for  the  engineering  of  mechanical  aptitudes. 
By  this  we  mean  that  ai  research  can  be  viewed  as  an  attempt  to  develop 
the  computers  and  other  hardware,  programming  languages,  and  human 
expertise  necessary  to  design  machines  with  aptitudes  for  specific  real- 
world  environments.  This  viewpoint  springs  from  the  recognition  that 
some  procedures  (machines)  will  appear  to  be  intelligent  in  some  en¬ 
vironments  and  unintelligent  in  others.  Rather  than  search  for  a  pro¬ 
cedure  that  will  be  intelligent  in  all  (or  even  many)  environments,  ai 
researchers  may  look  for  a  framework  (computer,  language,  expertise) 
within  which  to  design  procedures  that  can  be  tailored  for  intelligence 
in  specific  environments.  Although  ai  researchers  pursuing  this  paradigm 
are  concerned  with  developing  intelligent  machines,  they  are  more  con¬ 
cerned  with  finding  programming  languages  and  computers  that  will 
facilitate  the  development  and  description  of  a  wide  variety  of  different 
intelligent  machines,  each  with  its  own  aptitude  for  solving  problems  in 
a  real-world  environment  (some  machines  may  have  many  of  the 
aptitudes  possessed  by  others).  Many  investigators  have  worked  within 
this  paradigm,  too  many  for  us  to  identify  at  this  time  all  those  who 
have  made  important  contributions.  Chapters  6,  7,  and  8  are,  in  effect, 
a  discussion  of  the  work  that  has  been  done  using  this  paradigm. 

The  idea  of  mechanical  aptitudes  is  a  valuable  one,  whether  or 
not  we  seek  a  general  framework  within  which  to  design  them.  Most 
AI  researchers  have  not  aimed  directly  at  the  goal  of  constructing  com¬ 
pletely  intelligent  machines,  able  to  display  intelligence  at  a  human 
level.  Rather,  most  work  in  artificiah intelligence  has  been  devoted  to 
the  machine  simulation  ot  specific  intellectual  abilities  (giving  machines 
specific  aptitudes)  such  as  the  ability  to  play  games  or  the  ability  to 
prove  mathematical  theorems.  There  are  three  basic  reasons  for  this 
approach : 

First,  the  theoretical  and  practical  knowledge  necessary  to  do 
really  general  work  was  (and  is)  extremely  limited.  There  is  no 
adequate  guideline  that  can  tell  us  in  any  detail  how  to  build  machines 
with  a  truly  general  artificial  intelligence. 

Second,  one  of  the  best  ways  to  acquire  this  sort  of  information 
is  to  make  a  thorough  comparison  of  human  and  machine  abilities  in 
limited  problem  areas  or  environments.  The  precise  nature  of  the  dif- 
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Acuities  involved  in  ai  research  may  show  up  more  clearly  if  we  confine 
our  early  inquiries  to  the  simulation  of  specific  aptitudes  possessed  by 
natural  intelligence.  Hopefully,  many  limited  attempts  at  machine  intel¬ 
ligence  will  eventually  provide  better  grounds  for  generalization. 

Finally,  there  is  always  pressure  for  some  immediate  results,  both 
to  solve  current  and  practical  problems — such  as  character  recognition 
or  assembly-line  balancing  (see,  for  example,  Tonge,  1963)--~that  do 
not  require  general  mechanisms  for  artificial  intelligence,  and  to  estab¬ 
lish  by  experiment  a  likelihood  that  the  more  general  attempts  will 
eventually  succeed.^ 

The  specific  machine  aptitudes  that  have  received  the  most  investi¬ 
gation  by  AI  researchers  are  problem  solving,  game  playing,  pattern 
recognizing,  theorem  proving,  and  language  understanding.  Two  facts 
concerning  specific”  machine  aptitudes  should  be  emphasized:  First, 
there  are  levels  of  generality  in  the  aptitudes  that  machines  may  possess. 
Thus,  one  procedure  may  have  an  aptitude  for  playing  a  specific  game, 
such  as  Chess,  and  another  procedure  may  have  an  aptitude  for  playing 
many  different  games;  procedures  with  a  “specific”  aptitude  for  playing 
many  different  games  are  said  to  be  general  game-playing  procedures. 
Similarly,  a  program  with  an  aptitude  for  solving  many  different  prob¬ 
lems  is  called  a  ‘general”  problem  solver  (it  is  not  required  that  the 
program  be  able  to  solve  all  problems,  or  even  that  the  problems  it 
is  able  to  solve  be  especially  difficult).  AI  research  has  so  far  had  only 
limited  success  in  developing  general  problem  solvers,  general  game 
players,  general  pattern  recognizers,  and  general  theorem  provers.  No 
procedures  have  yet  been  developed  which  we  could  fairly  say  are 
“general  language  understanders”  (note  3--2). 

The  other  fact  that  should  be  emphasized  concerns  the  inter¬ 
dependence  of  aptitudes.  Throughout  this  book  we  shall  see  many 
ways  in  which  machines  with  one  general  aptitude  must  have  other 
(perhaps  less  general)  aptitudes.  Thus,  it  can  be  shown  that  general 
game  players  must  have  an  aptitude  for  pattern  recognizing  (see,  e.g,, 
Banerji,  1969),  and  general  pattern-recognizing  programs  must  have 
an  aptitude  for  language  understanding  (see  Chapters  5  and  7). 

Evolutionary  and  Reasoning  Programs 

The  term  general  artificial  intelligence,  as  it  is  used  here,  refers 
loosely  to  a  machine  (procedure)  that  has  aptitudes  for  general 
problem  solving,  general  game  playing,  general  theorem  proving,  gen¬ 
eral  pattern  recognizing,  and  general  language  understanding,  and  also 

^  We  discuss  the  practical  uses  and  effects  of  general  artificial  intellisence 
more  thoroughly  in  Chapter  9. 
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has  aptitudes  enabling  it  to  display  all  other  kinds  of  intelligent  be¬ 
havior  normally  exhibited  by  people.  Again,  no  one  has  yet  been  suc¬ 
cessful  in  giving  a  machine  the  aptitudes  corresponding  to  general 
artificial  intelligence.  There  have  been  primarily  two  types  of  ap¬ 
proach  to  the  goal  of  achieving  a  general  machine  intelligence,  one  of 
which  is  an  evolutionary  approach,  the  other  being  what  is  called  (fol¬ 
lowing  McCarthy)  the  “reasoning  program”  approach.  These  ap¬ 
proaches  are  not  mutually  exclusive,  but  as  yet  they  have  not  been 
combined:  Thus,  one  can  imagine  reasoning  programs  that  might 
change  their  rules  of  inference  and  “evolve,”  and  one  can  ima^ne 
interrelating  reasoning  programs  that  would  form  a  “self-organizing” 
whole,  which  might  itself  be  a  reasoning  program  (see  Chapter  8). 

The  evolutionary  programs,  such  as  those  written  by  Friedberg 
et  al.  (1958,  1959)  and  Fogel  et  al.  (1966)  and  suggested  by  Holland 
(1970)  and  Campbell  (1960),  are  programs  that  produce,  select,  and 
modify  subprograms  according  to  their  ability  to  perform  various 
tasks.  There  is  no  reason  in  theory  why  evolutionary  programs  might 
not  eventually  be  used  to  produce  a  general  artificial  intelligence,  but 
as  yet  the  evolutionary  approach  has  had  little  success. 

The  reasoning-program  approach  is  an  attempt  to  develop  a  single 
program  capable  of  perceiving  facts  about  its  environment,  of  drawing 
conclusions  from  facts,  of  discovering  an  adequate  means  for  the  ex¬ 
pression  of  facts,  of  formulating  its  own  goals  and  strategies,  and  acting 
according  to  them — a  program  that  would,  in  short,  be  a  rational 
entity.  The  most  well-known  example  of  this  approach  is  probably  the 
“General  Problem  Solver”  of  Newell,  Shaw,  and  Simon  (1963),  which 
might  be  described  as  a  preliminary  investigation  of  the  rational  process. 
McCarthy  (1963a,b;  with  Hayes,  1968a)  took  a  somewhat  different 
approach  to  the  same  goal,  concentrating  particularly  on  what  sort  of 
internal  language  (means  for  expressing  facts)  would  be  the  best  for  a 
reasoning  program. 

This  chapter  discusses  the  work  of  McCarthy,  Newell,  Shaw, 
Simon,  Ernst,  Nilsson,  Amarel,  Hewitt,  Fikes,  Pohl,  and  others,  rele¬ 
vant  to  the  construction  of  reasoning  programs  and  to  giving  machines 
a  “specific”  aptitude  for  general  problem  solving. 

PARADIGMS  FOR  THE  CONCEPT  OF 
“PROBLEM” 

Situation-Space 

What  is  a  problem?  Perhaps  the  best  answer  ai  researchers  can  give 
is  that  the  real-world  nature  of  “problems”  still  has  not  been  either  fully 
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formalized  or  fully  investigated.  We  can,  however,  describe  two  basic 
models,  or  paradigms,  for  the  concept  of  “problem,”  which  would  have 
to  be  included  in  any  fully  general  formalization. 

Our  first  general  paradigm  for  “problem”  is  the  situation-space 
model.  A  problem  presented  in  this  formalization  consists  of  an  initial 
situation,  a  set  of  possible  situations,  and  a  set  of  possible  actions,  to¬ 
gether  with  a  specification  of  how  the  various  situations  can  be  pro¬ 
duced  from  each  other  by  different  actions,  and  the  specification  of  a 
final,  desired  situation,  or  goal;  the  statement  of  the  problem  might  also 
include  a  specification  of  certain  situations  to  be  avoided.  A  solution 
to  a  situation-space  problem,  then,  is  any  sequence  of  actions  that  leads 
from  the  initial  situation  to  the  desired  situation,  and  avoids  the  unde¬ 
sired  situations. 

Several  additions  should  be  made  to  this  model  for  “problem”  if 
we  are  to  insure  some  generality  in  its  application  to  the  real  world. 
First,  we  should  allow  the  situations  of  a  given  situation-space  to  be 
partially-specified;  that  is,  we  should  not  require  in  general  that  a 
complete  description  be  obtainable  for  any  given  situation  (though  for 
the  simpler  problems  so  far  considered  in  ai  research  such  descriptions 
are  usually  available);  rather  we  might  allow  a  given  situation  to  be 
described  by  a  set  of  sentences,  each  presenting  a  fact  about  the  situa¬ 
tion  from  which  new  sentences  may  possibly  be  derived.  The  set  of 
sentences  describing  a  given  situation  may  be  incomplete;  that  is,  one 
may  not  be  able  to  answer  all  conceivable  questions  about  the  situation. 
Again,  the  result  of  applying  an  action  to  a  given  situation  will  not 
necessarily  be  a  completely  specified  situation.  In  the  same  vein,  the  goal 
to  be  obtained  by  solution  of  the  problem  may  be  only  partially-specified. 

Also,  in  full  generality  we  would  not  require  that  the  result  of  ap- 
action  to  a  given  situation  necessarily  be  a  unique  situation,  or 
even  a  unique  partially-specified  situation.  That  is,  we  should  allow 
actions  to  be  nondeterministic”  in  their  consequences,  sometimes  yield¬ 
ing  one  partially-specified  situation  out  of  a  set  of  partially-specified 
situations. 

Finally,  a  solution  to  a  situation-space  problem  may  in  general  be 
partially-specified;  that  is,  the  solution  may  be  described  as  dependent 
on  various  contingencies  that  cannot  be  completely  determined  in  ad¬ 
vance.  For  example:  “If  X  should  become  a  factor  then  do  T,”  “If  Z 
should  happen,  then  formulate  a  new  solution.”  Thus,  the  solution  may 
in  general  be  a  plan,  or  strategy,  not  a  specific  string  of  actions.  The 
various  actions  that  might  be  included  in  a  given  solution  should  include 
“looking  for  a  new  solution”;  “discovering  more  information  about 
relevant  situations  ;  and  “interrupting  one’s  actions,  not  doing  any- 
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A  good  example  of  a  real-world  problem  that  is  partially-specified 
is  McCarthy’s  Airport  Problem:  The  problem  consists  in  going  from 
one’s  home  to  the  airport.  Two  basic  actions  are  available,  driving  and 
walking.  To  solve  the  problem,  one  starts  at  home,  walks  to  one’s  car, 
and  drives  to  the  airport.  However,  in  reality  one  cannot  specify  com¬ 
pletely  and  invariably  all  of  the  details  of  the  situations  and  actions  that 
may  occur  in  solving  the  problem;  so,  a  single  string  of  actions  cannot 
be  produced  which  is  a  guaranteed  solution.  One  could,  for  example. 

Break  one’s  leg  going  to  the  car. 

Have  a  flat  tire  while  driving  to  the  airport. 

Misread  a  highway-direction  sign  and  get  lost. 

Run  out  of  gas  or  have  engine  trouble. 

Come  to  a  roadblock  or  a  detour. 

A  machine  attempting  to  solve  the  Airport  Problem  could  run  into 
similar  difficulties,  yet  these  are  all  obstacles  a  general  intelligence  could 
surmount  (though  in  doing  so  it  might  need  to  enlist  the  aid  of  other 
intelligences).  The  nature  of  this  problem’s  difficulty  lies  in  the  partial- 
specification  of  its  situations,  actions,  and  solutions.  This  is  true  of  most 
problems  in  the  real  world. 


System  Inference 

Our  other  paradigm  for  “problem”  is  the  paradigm  of  system 
inference.  Problems  in  this  paradigm  may  take  many  different  forms  of 
representation,  aU  of  them  theoretically  equivalent,  though  a  machine 
working  within  this  paradigm  might  sometimes  find  the  use  of  one 
representation  to  be  more  efficient  than  the  use  of  another.  Various 
forms  of  “system  inference”  would  respectively  require  a  problem-solv¬ 
ing  machine  to  be  capable  of  inferring: 

1.  A  function  f  from  a  set  A  to  2l  set  B,  given  examples  of  the 
function’s  values  for  a  subset  of  A . 

2.  A  relation  R  within  a  set  X,  given  a  description  of  X  and  a 
set  of  examples  (positive  or  negative)  of  the  way  R  holds 
throughout  X. 

3.  A  grammar  for  a  string  language  L,  given  a  set  of  sentences 
that  belong  to  L  and  a  set  of  sentences  that  do  not. 

4.  A  mathematical  theory,  given  a  set  of  propositions  that  are 
true  within  the  theory,  and  a  set  that  are  not. 

5.  A  Turing  machine  T,  given  a  sample  of  its  behavior  on  a  set 
of  input  strings. 
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(Of  course  this  list  is  not  exhaustive.)  The  inference,  or  system,  pro¬ 
posed  by  a  problem  solver  as  a  solution  to  one  of  these  requirements 
will  typically  be  a  finite  description  of  a  function,  relation,  string  gram¬ 
mar,  mathematical  theory,  or  Turing  machine. 

The  generality  of  this  paradigm  as  a  model  of  mathematical  prob¬ 
lems  should  be  strongly  suggested,  but  there  may  be  some  doubt  as  to 
its  relevance  to  the  real  world.  To  help  insure  this  relevance,  we  should 
allow  the  evidence  for  a  given  system-inference  problem  to  be  partially- 
specified  and  also  allow  the  solutions  (i.e.,  systems)  proposed  by  the 
problem-solving  machine  to  be  partially-specified.  Again,  the  machine 
should  have  some  language  for  representing  its  knowledge  of  a  given 
inference  problem,  and  it  should  have  some  way  of  determining  informa¬ 
tion  that  will  help  it  decide  among  the  various  systems  it  might  infer 
as  a  solution  to  a  problem.  It  will  often  be  the  case  that  a  machine  will 
be  able  to  infer  several  systems  consistent  with  the  evidence  it  has  been 
given.  However,  we  would  not  require  that  it  be  able  to  derive  its 
inference (s)  from  the  given  evidence,  nor  even  necessarily  that  it  be 
able  to  prove  that  its  proposed  solutions  are  consistent— nevertheless, 
a  system-inference  rnachine  should  be  able  to  defer  to  experience  and 
not  make  an  inference  once  it  has  recognized  evidence  that  refutes  it. 
Also,  a  system-inference  machine  should  be  able  to  detect,  or  try  to 
detect,  that  its  evidence  is  self-contradictory,  and  it  should  usually  tend 
to  propose  increasingly  better  solutions. 

An  intuitive  example  of  a  real-world  system-inference  problem  is 
the  problem  of  invention:  That  is  to  say,  given  a  description  of  some 
task  to  be  performed  (peel  potatoes),  find  a  description  of  an  object 
that  will  perform  the  task  (draw  a  blueprint  for  an  automatic  potato- 
peeler).  The  task  to  be  performed  can  be  corresponded  to  a  function 
that  maps  situations  into  situations;  the  description  of  the  task  can  be 
corresponded  to  a  description  of  the  function  values  on  certain  inputs; 
and  the  invention  produced  by  the  problem-solving  machine  can  be 
corresponded  to  a  finite  description  of  a  function  (a  program  for  a 
universal  Turing  machine)  that  performs  the  task.  An  efiicient  mechani¬ 
cal  inventor  should  use  what  might  be  called  the  “principle  of  economy 
of  invention  .  Do  not  design  an  invention  to  depend  on  other,  un¬ 
achieved  inventions  if  you  can  help  it  (note  3-3). 

Of  course  no  one  has  built  a  “general  invention-making  machine,” 
but  the  possibility  is  clearly  in  line  with  the  notion  of  general  artificial 
intelligence. 

Actually,  each  of  these  paradigms  for  the  concept  of  “problem,” 
the  situation-space  model  and  the  system-inference  model,  is  equivalent 
to  the  other .  It  is  likely  that  any  problem  which  can  be  stated  in  one 
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paradigm  can  be  stated  in  the  other,  and  that  each  of  these  models  is 
merely  a  different  way  for  representing  the  same  underlying  idea  about 
the  general  nature  of  problem-solving  ability.  Still,  we  should  emphasize 
again  that  neither  paradigm  has  yet  been  completely  formalized  or 
investigated  as  regards  its  application  to  “problems  of  the  real  world.” 
Finally,  we  should  mention  that  for  any  problem,  there  are  essentially 
two  levels  of  solution:  The  first  level  is  to  prove  the  existence  of  a  solu¬ 
tion  to  the  problem,  and  the  second  level  is  to  construct  the  solution 
itself.  Polya  (1945)  presented  an  excellent  introduction  to  the  nature  of 
problems  and  their  solutions,  and  gave  attention  to  some  aspects  of 
real-world  problems. 


PROBLEM  SOLVERS,  REASONING 
PROGRAMS,  AND  LANGUAGES 

General  Problem  Solver 

The  rest  of  this  chapter  will  be  concerned  with  computer  programs 
that  are  capable  of  solving  problems  stated  in  the  situation-space  para¬ 
digm.  Programs  that  work  with  problems  stated  in  other  paradigms  ate 
discussed  primarily  in  Chapter  7.  We  shall  see  the  situation-space  para¬ 
digm  used  in  Chapters  4  and  6  by  programs  which  play  games  and  prove 
theorems.  In  this  section  we  are  concerned  with  two  questions:  First, 
what  should  be  the  nature  of  a  machine  that  would  be  a  general  prob¬ 
lem-solver  for  problems  of  this  type?  Second,  how  should  a  machine  of 
this  type  be  designed  to  operate  in  a  real-world  environment  similar  to 
our  own? 

One  example  of  a  fairly  general  program  for  solving  situation- 
space  problems  is  the  General  Problem  Solver  (gps)  program  of 
Newell,  Shaw,  Simon,  and  Ernst  (1963  et  seq.).  GPS  made  use  of  an 
elementary  language  for  the  description  of  situation-space  problems. 
That  is,  GPS  was  capable  of  accepting  descriptions  of  objects  and 
operators  (=  situations  and  actions)  and  of  accepting  information  that 
a  certain  object  was  the  initial,  or  given,  object  and  that  a  certain  object 
was  the  desired  object,  or  goal.  The  gps  language  contained  what  might 
be  called  the  first  degrees  of  partial  specification:  One  could  specify  to 
gps  that  a  class  of  objects  (e.g.,  “any  expression  without  an  integral 
sign”)  was  to  be  the  goal,  and  the  program  could  decide  that  some  ob¬ 
jects  would  be  considered  partial  solutions.  This  was  done  with  the  use 
of  difference  operators,  which  were  capable  of  detecting  various  types 
of  differences  between  objects.  The  differences  were  themselves  also 
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treated  as  objects,  and  gps  could  define  subgoals  of  “changing  the  dif¬ 
ference”  between  two  objects.  Thus,  gps  would  seek  to  minimize  one 
difference  at  a  time  between  two  objects,  and  it  usually  was  given  an 
ordering  for  the  various  differences:  Minimizing  one  difference  could 
be  considered  more  important  than  minimizing  another. 

GPS  used  the  same  problem-solving  technique  (referred  to  as 
means-ends  analysis  by  its  authors)  on  every  situation-space  problem 
it  was  given;  the  technique  comprised  three  essential  steps: 

1.  Evaluating  the  difference  between  the  current  situation  and 
the  goal. 

2.  Finding  an  operator  that  typically  lowers  the  type  of  dif¬ 
ference  found  in  step  1. 

3.  Checking  to  see  if  the  operator  found  in  step  2  can  be  ap¬ 
plied  to  the  current  situation;  if  it  can,  then  apply  it,  else 
determine  a  situation  required  for  the  application  of  that 
operator,  and  establish  it  as  a  new  (sub)  goal;  then  go  to 
step  1. 

GPS  was  applied  to  many  different  simple  problems,  such  as  the 
Missionary-Cannibals  Problem  (see  the  last  section)  and  the  Tower 
of  Hanoi  (see  the  Exercises).  It  was  also  shown  to  be  able  to  prove 
relatively  simple  theorems  in  mathematical  theories;  its  authors  were 
able  to  describe  the  resolution  principle  of  J.  A.  Robinson  (see  Chap¬ 
ter  6)  within  their  formalization  for  operators  and  objects.  On  all  of 
these  (fairly  simple)  problems,  gps  was  successful,  though  usually  it 
was  not  as  fast  in  producing  answers  as  were  special  programs  designed 
to  solve  the  individual  problems. 

In  several  respects,  gps  was  not  a  fully  general  problem  solver.  In 
the  first  place,  gps  could  not  produce  a  plan  or  strategy  as  its  solution; 
the  only  solution  gps  could  produce  would  necessarily  be  a  specific 
sequence  of  actions  that  would  lead  to  the  desired  goal.  Also,  gps 
could  be  applied  only  to  problems  that  could  be  completely  specified, 
where  the  various  actions,  objects,  differences,  etc.,  could  be  exactly 
described  for  the  given  problem.  Thus,  gps  was  completely  dependent 
on  the  ability  of  its  programmer  to  produce  a  suitable  representation 
for  the  problem. 

As  an  example,  gps  was  given  the  famous  Seven  Bridges  of 
Konigsberg  Problem  (see  Fig.  3-1).  The  problem  is  to  go  over  each  of 
the  seven  bridges  once  and  only  once  and  return  to  the  point  from 
which  you  started.  This  problem  was  shown  to  be  unsolvable  by  Euler 
in  1736,  using  certain  topological  considerations.  When  given  the 
problem,  gps  tried  the  same  paths  repeatedly  and  eventually  gave  up. 
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Figure  3-1.  The  seven  bridges  of  Konigsberg. 


unable  to  achieve  a  solution  because  it  could  not  look  at  the  problem 
in  a  general  way:  It  could  not  develop  a  partially  specified  solution,  or 
strategy,  and  then  prove  whether  the  strategy  would  work,  nor  could  it 
prove  theorems  about  the  problem  or  its  solutions.  Since  gps  could  not 
invent  Euler’s  “topological  considerations,”  it  could  not  prove  the  puz¬ 
zle  to  be  unsolvable. 

Of  course  most  people  couldn’t  do  this  either,  or  at  least  not  right 
away;  otherwise  the  problem  would  never  have  become  famous.  Usu¬ 
ally,  the  first  thing  a  person  will  try  is  a  GPS-like  search.  However,  a 
person  can  stop  such  a  search  if  it  seems  to  be  fruitless,  and  can  try  to 
reason  about  the  problem  itself. 

All  of  which  is  to  say  that  gps  was  highly  “representation  de¬ 
pendent,”  more  so  than  a  truly  general  problem  solver  would  be.^  We 
should  expect  a  representation-independent  problem  solver  to  be  capa¬ 
ble  of: 

1.  Inventing  new  representations  for  a  given  problem,  if  it  can¬ 
not  solve  the  problem  using  the  ones  it  has. 

2.  Discovering  facts,  and  perhaps  proving  theorems,  about  rep¬ 
resentations  and  problems,  their  interrelations,  etc. 

3.  Asking  for,  and  looking  for,  help  in  the  outside  world. 

Each  of  these  abilities  would  be  necessary  to  a  problem  solver  that 
functions  in  the  real  world. 


Reasoning  Programs 

Following  McCarthy  and  Hayes  (1968a),  let  us  label  general 
problem  solvers  that  work  within  the  situation-space  paradigm,  and 
which  possess  independence  from  representations  in  this  sense,  as  rea- 

^  This  criticism  also  applies  to  the  more  recent  general  problem  solvers 
such  as  FDs  (Quinlan  and  Hunt,  1968),  multiple  (Slagle  and  Bursky,  1968), 
and  REF-ARF  (Fikes,  1970).  These  programs  are  each  capable  of  solving  a  variety 
of  different  problems,  but  they  are  all  highly  representation-dependent. 
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soning  programs  (rp’s).  At  the  moment,  rp’s  are  still  in  the  conceptual, 
“thought-experiment”  stages  of  development.  We  are  primarily  con¬ 
cerned  with  rp’s  that  could  solve  situation-space  problems  that  might 
occur  in  a  real-world  environment  similar  to  our  own. 

Basically,  a  reasoning  program  is  to  be  capable  of  sensing  and 
operating  on  the  world  through  perhaps  several  means,  such  as  tele¬ 
vision  cameras  and  mechanical  arms,  and  of  communicating  with  peo¬ 
ple  through,  say,  keyboards  and  video  displays.  Its  observations  at  a 
given  moment  may  be  stored  internally  in  several  forms:  Pictures,  for 
example,  might  be  stored  as  matrices,  lists,  or  other  data-structures. 
However,  any  data  stored  by  the  rp  is  ultimately  to  be  described, 
within  the  rp,  by  sentences  in  a  general  language  for  the  representa¬ 
tion  of  phenomena.  RP  should  be  capable  of  proving  theorems  about 
phenomena,  stated  within  this  language,  and  of  deciding  what  actions 
to  perform  on  the  basis  of  these  theorems;  its  “phenomena  language” 
should  be  capable  of  describing  the  actions  it  can  perform,  as  well  as 
the  situations  it  can  observe,  and  of  describing  interrelations  between 
them.  The  language  should  be  capable  of  describing  hypothetical  situa¬ 
tions  and  actions,  of  designating  some  as  desirable  and  others  as  not. 
Finally,  the  phenomena  language  should  be  capable  of  describing  repre¬ 
sentations  of  problems,  as  well  as  problems  themselves:  RP  should  be 
capable  of  reasoning  about  its  representations  as  well  as  with  its 
representations,  as  described  above. 

A  language  is  essentially  a  way  of  representing  facts.  An  important 
question,  then,  is  what  kinds  of  facts  are  to  be  encountered  by  the  rp 
and  how  they  are  best  represented.  It  should  be  emphasized  that  the 
formalization  presented  in  Chapter  2  for  the  description  of  phenomena 
is  not  adequate  to  the  needs  of  the  rp.  The  formalization  in  Chapter  2 
can  be  said  to  be  metaphysically  adequate,  insofar  as  the  real  world 
could  conceivably  be  described  by  some  statement  within  it;  however, 
it  is  not  epistemologically  adequate,  since  the  problems  encountered 
by  an  rp  in  the  real  world  cannot  be  described  very  easily  within  it. 
Two  other  examples  of  ways  of  describing  the  world,  which  could  be 
metaphysically  but  not  epistemologically  adequate,  are  as  follows: 

1.  The  world  as  a  quantum  mechanical  wave  function. 

2.  The  world  as  a  cellular  automaton.  (See  Chapter  8.) 

One  cannot  easily  represent  within  either  of  these  frameworks  such 
facts  as  “Today  is  my  programmer’s  birthday,”  or  “I  don’t  know  what 
you  mean,”  or  “San  Francisco  is  in  California,”  or  “Ned’s  phone- 
number  is  854-3662.” 
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If  we  use  human  languages  as  an  example,  we  can  identify  several 
things  an  rp  language  should  be  able  to  express  very  easily. 

Causality.  The  language  should  enable  rp  to  express  various  forms 
of  causality  relationships  between  situations  and  phenomena:  “fire 
causes  smoke.” 

Temporality.  The  language  should  be  able  to  express  that  one 
situation  precedes  another,  that  one  situation  follows  another  im¬ 
mediately,  that  one  situation  may  precede  another,  etc.  “Harry  will  get 
home  by  the  time  John  does.” 

Ability.  The  language  shoud  be  able  to  express  such  notions  as  “Z 
can  do  Y”  (perhaps  with  appropriate  modifiers;  e.g.,  “if  X  is  given 
certain  knowledge”;  thus,  a  person  can  open  any  combination  safe,  if 
he  knows  its  combination). 

Relevance  and  Plausibility.  The  language  should  make  it  possible 
to  express  the  notion  that  certain  situations  or  problems  are  relevant  to 
each  other,  or  may  be  relevant  to  each  other,  though  perhaps  not  in 
any  known  way.  The  language  should  also  include  the  possibility  of 
expressing  the  plausibility  and  relevance  of  sentences :  “These  are  all  the 
sentences  necessary  to  describe  the  problem”;  ”X  is  analogous  to  T”; 
“These  sentences  are  plausible.” 

Possibility  and  Probability.  The  language  should  be  able  to  express 
notions  of  indeterminacy  and  undecidability  and,  if  necessary,  treat 
them  mathematically. 

Knowledge  and  Certainty.  The  language  should  enable  rp  to  ex¬ 
press  that  something  is  known:  “John  knows  Bill’s  phone-number”; 
“John  knows  how  to  find  Bill’s  phone  number”;  “Someone  here  may 
know  what  time  it  is.” 

Desirability  and  Undesirability.  The  language  should  enable  rp  to 
denote  situations  (and  perhaps  actions)  as  being  desirable  or  undesira¬ 
ble. 

Equivalence  and  Denotation.  RP  should  be  able  to  express  several 
different  types  of  equivalence,  such  as  “The  morning  star  is  the  evening 
star”;  “The  velocity  is  50  mph”;  **X^  =  X  •  Z.” 

Existence.  RP  should  be  able  to  say  that  some  things  exist  differ¬ 
ently  from  others:  “Z  is  a  solid”;  “T  is  an  expression  of  information.” 

Sup  positionality  or  Hypotheticalness.  RP  should  be  easily  able  to 
state  that  some  of  the  statements  it  is  using  are  “advanced  for  the  sake 
of  discussion”  (see  Carnap,  1947,  1950;  Quine,  1955—1964;  Hintikka, 
1962,  1969;  and  Rescher,  1964,  1967). 

This  list  is  only  illustrative;  many  more  examples  could  be  added, 
and  each  example  could  be  treated  in  much  greater  detail.  It  is  also 
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true  that  these  examples  overlap  each  other;  for  a  more  thorough  treat¬ 
ment,  the  reader  should  see  the  paper  by  McCarthy  and  Hayes  (1968a). 

One  final  thing  to  note  on  the  subject  of  reasoning  programs  is 
that  the  language  used  by  an  rp  will  typically  be  changed  with  time 
by  the  rp.  We  should  expect  in  general  that  a  reasoning  program  will 
find  it  necessary  to  define  new  words  or  to  accept  definitions  of  new 
words;  some  of  these  words  will  denote  new  situations,  actions,  phe¬ 
nomena,  or  relations — the  rp  may  have  to  infer  the  language  it  uses  for 
solving  a  problem.  Our  most  important  requirement  for  the  initial 
language  is  that  any  necessary  extensions  to  it  be  capable  of  being  easily 
added  to  it.  For  a  further  discussion  on  the  nature  of  languages  and 
their  use  by  machines,  see  Chapter  7.  Predicate  calculus  has  been  sug¬ 
gested  as  a  possible  basic  language  for  an  rp,  and  Chapter  6  discusses 
computer  programs  capable  of  proving  statements  expressed  in  predicate 
calculus  theories.  In  the  final  section  of  this  chapter,  discussion  is  con¬ 
tinued  on  the  subject  of  representation-independent  problem  solvers. 


STATE-SPACE  (StTUATION-SPACE) 
PROBLEMS 

Representation 

This  section  discusses  the  situation-space  paradigm  itself  in  some 
detail,  since  it  is  perhaps  the  most  popular  one  used  by  ai  researchers, 
and  since  there  has  been  a  considerable  theory  of  problem  solving, 
known  as  heuristic  search  theory,  developed  around  it. 

The  situation-space  paradigm  has  been  given  several  (slightly) 
different  formalizations;  in  the  literature  of  ai  research  it  is  usually 
called  the  “state  space”  paradigm,  which  is  the  name  originally  given 
to  it  by  researchers  in  the  fields  of  operations  research  and  control 
theory.  In  this  discussion  “situation-space”  and  “state-space”  termi¬ 
nologies  are  used  somewhat  interchangeably,  as  defined  below.  The 
formalization  presented  is  essentially  that  of  Nilsson  (1971),  which 
gives  an  extensive  coverage  of  heuristic  search  theory.  Other  formaliza¬ 
tions  are  presented  in  Banerji  (1969),  Sandewall  (1969),  and  Quinlan 
and  Hunt  (1968). 

Anyone  who  wishes  to  understand  the  current  directions  of  ai 
research  should  make  an  effort  to  understand  the  state-space  paradigm. 
While  the  ideas  involved  are  not  very  difficult,  their  presentation  will  go 
easier  if  we  consider  a  simple  example.  Such  an  example  is  the  Three 
Coins  Problem,  which  is  stated  below.  After  reading  the  statement  of 
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the  problem,  the  reader  is  urged  to  solve  it — its  solution  is  very  straight¬ 
forward. 

Three  Coins  Problem 

Given  three  coins  arranged  as  in  Fig.  3-2,  make  them  all  the  same 
(i.e.,  either  all  heads  or  all  tails),  using  exactly  three  moves.  By  a  move 
in  this  case  is  meant  flipping  one  of  the  coins  over,  so  that  if  it  is  heads 
before  the  move,  it  becomes  tails  afterward,  etc. 


Figure  3-2.  Initial  state  of  the  Three  Coins  Problem. 


The  Three  Coins  Problem  can  be  easily  stated  as  a  state-space 
problem.  A  configuration  of  the  coins  is  a  state.  The  initial  state,  or 
start,  is  denoted  by  the  expression  HHT.  The  desired  states,  or  goals, 
are  TTT  and  HHH.  For  any  given  state  there  are  three  possible  operfl- 
tors:  “turn  the  first  coin  over”;  “turn  the  second  coin  over”;  and  “turn 
the  third  coin  over.”  A  move  corresponds  to  the  choice  of  one  of  these 
operators,  and  a  solution  to  the  problem  is  a  sequence  of  three  moves 
that  transforms  the  start  into  one  of  the  goals. 

Let  us  label  the  three  operators  a.s  A,  B,  and  C,  respectively.  Thus, 
B  applied  to  HHT  yields  HTT;  we  can  briefly  denote  this  fact  by  the 
expression 

HHT — ^HTT 

Since  B  applied  to  HTT  yields  HHT,  we  shall,  however,  write 
HHT(  —  ■  ■>HTT 

Given  this  notation,  the  diagram  shown  in  Fig.  3-3  depicts  the  state 
space  of  the  Three  Coins  Problem;  that  is,  all  the  possible  states  and 
the  result  for  each  state  of  applying  each  of  the  possible  operators  to  it. 
By  tracing  through  the  diagram,  we  see  that  one  sequence  of  moves 
which  solves  the  problem  is  “first  A,  then  C,  then  A,''  or  AC  A  for  short. 
The  other  solutions  to  the  problem  ate  AAC,  CAA,  BCB,  BBC,  CBB, 
and  CCC;  each  of  these  leads  to  the  goal  HHH.  (There  is  no  way  to 
go  from  HHT  to  TTT  in  exactly  three  steps.) 
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Figure  3-3.  A  state-space  for  the  Three  Coins  Problem. 


We  shall  consider  a  state  to  be  a  finitely  describable  mathematical 
object;  in  the  Three  Coins  Problem  each  state  was  described  by  a  string 
of  three  letters  (e.g.,  HTH).  Other  ways  in  which  states  can  be  de¬ 
scribed  include  numbers,  matrices,  lists,  graphs,  sentences,  sets,  vectors, 
and  trees.  (Graphs  and  trees  are  defined  below;  the  mathematical  notion 
of  “sentence”  is  discussed  in  Chapter  7.)  A  state  could  be  infinite,  but 
the  fact  that  it  has  a  finite  description  means  we  can  discuss  it  logically, 
prove  theorems  about  it,  etc.  However,  throughout  the  rest  of  this  book 
we  shall  be  concerned  only  with  finite  states.  From  the  computer’s 
standpoint  the  description  of  a  state  is  a  data-structure  (see  Knuth, 
1969). 

Similarly,  an  operator  is  a  finitely  describable  means  of  transform¬ 
ing  one  state  into  another  state;  there  may  be  many  ways  of  describing 
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a  given  operator,  and  from  the  computer’s  standpoint  an  operator  is"a 
computational  procedure. 

A  description  of  a  state-space  problem,  then,  is  the  specification  of 
three  things: 

S,  a  set  of  possible  starting  states 

F,  a  set  of  operators 

G,  a  set  of  desired  states,  or  goals 

A  solution  (or  solution  path)  to  a  state-space  problem  is  also  the  spec¬ 
ification  of  three  things : 

s,  one  of  the  possible  starting  states 

g,  one  of  the  desired  states 

a  finite  sequence  of  operators  that  transforms  s  into  g 

Thus,  if  ^  is  an  operator  and  if  we  denote  the  result  of  applying  q  to  s 
by  the  expression  and  if  quq2, .  .  .  qn^u  9^  is  a  solution  to  a  state- 
space  problem,  then  we  have 

g  =  qniqn-iC  ■  •  q2(qi{s) )...)) 

There  may,  of  course,  be  many  solutions  to  a  given  state-space  problem 
(5,F,G).  We  may  consider  a  given  (S,F,G)  state-space  problem  to  be 
a  collection  of  smaller  state-space  problems,  each  of  the  form  ({s},F,G), 
where  s  G  S — we  shall  say  a  procedure  solves  the  (S,F,G)  problem 
if  it  is  capable  of  producing  a  solution  path  for  each  of  the  correspond¬ 
ing  ({s},F,G)  problems  which  has  a  solution. 

The  observant  reader  has  probably  noted  that  the  definitions  in 
the  preceding  paragraph  make  no  mention  of  the  sequence  qi,q2i  ^  ^  •  ,qn 
consisting  of  three  or  any  other  prespecified  number  of  operators.  Yet 
we  required  in  our  informal  statement  of  the  Three  Coins  Problem  that 
the  solution  use  exaictly  three  moves,  that  is,  that  n  be  equal  to  3.  Can 
this  sort  of  requirement  be  made  within  the  framework  of  the  definitions 
given  in  the  preceding  paragraph? 

To  see  that  it  can,  the{ Three  Coins  Problem  is  restated  as  follows: 
Let  the  initial  state  5'  consist  of  the  three  coins,  as  in  Fig.  3-2,  and  let 
s  also  contain  a  “counter,”  initially  set  to  zero.  (The  counter  is  to  be 
capable  of  storing  arbitrarily  large  numbers.)  Denote  the  initial  state  s 
by  the  expression  (0,HHT) .  The  three  possible  operators  A,  B,  and  C, 
which  we  can  apply  to  an  arbitrary  state  {i,xyz),  will  now  be  respec¬ 
tively:  “Turn  coin  x  over  and  replace  i  by  i  1”;  “turn  coin  y  over 
and  replace  i  by  /  -f  1”;  and  “turn  coin  z  over  and  replace  i  by  /  +  1.” 
Finally,  the  set  G  of  goal  states  will  contain  two  members:  (3,HHH) 
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and  (3, TXT).  The  solutions  to  this  statement  of  the  problem  are  the 
same  as  our  solutions  to  the  previous  statement.®  On  the  other  hand, 
the  state  space  described  by  this  statement  of  the  problem,  and  shown 
in  Fig.  3-4,  is  somewhat  different  from  that  of  Fig.  3-3, 


Figure  3-4.  Another  state-space  for  the  Three  Coins  Problem. 


It  is  possible  to  state  many  other  problems  within  the  {S,F,G)  for¬ 
mat  defined  above  (note  3-4).  For  some  problems,  especially  those 
that  place  restrictions  on  the  desired  paths  from  start  to  goal,  it  is 
necessary  to  use  “counters”  or  other  devices.  However,  many  problems 
can  be  stated  rather  simply  within  the  (5,F,G)  state-space  paradigm. 
This  is  true  despite  the  fact  that  such  problems  will  often  have  solutions 
that  are  very  difficult  to  find.  One  reason  for  the  popularity  of  the 
(5,F,G)  state-space  paradigm  within  ai  research  is  that  it  simplifies 
the  problem  of  stating  problems  that  often  have  very  difficult,  hard-to- 
find  solutions. 


^This  is,  incidentally,  essentially  the  way  the  problem  was  stated  to  gps, 
which  solved  it. 
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Of  course  the  requirement  that  states  and  operators  be  finitely- 
describable  objects  is  not  entirely  consistent  with  “problems  of  the 
real  world.^’  We  can  expect  a  problem  solver  in  the  real  world  to  en¬ 
counter  things  for  which  it  does  not  have  complete,  finite  descriptions. 
The  statements  and  solutions  of  real-world  problems,  so  far  as  the 
mechanical  problem  solver  might  be  concerned,  would  still  be  finite 
descriptions,  but  they  could  be  incomplete.  A  real-world  mechanical 
intelligence  might  be  able  to  make  a  statement  like  “There  is  an  object 
on  the  road  ahead,  but  I  don’t  know  what  it  is;  I  had  better  slow  down 
and  try  to  see  what  it  is.”  In  general,  if  the  elements  of  a  given  problem 
are  partially  specified,  we  call  them  situations,  actions,  etc.,  whereas  if 
they  are  completely  described,  we  call  them  states  and  operators,  etc. 
Thus,  we  distinguish  between  situation-space  and  state-space  problems. 

PuzzUs 

None  of  the  foregoing  discussion  is  intended  to  deny,  however,  that 
state-space  problems  do  occur  in  real-world  environments  or  that  the 
study  of  state-space  problems  can  be  of  value  to  the  study  of  situation- 
space  problems.  Many  real-world  problems  can  be  expressed  in  the 
{S,F,G)  paradigm.  A  classic  example  is  the  Traveling  Salesman  Prob¬ 
lem,  which  occurs  in  various  forms  in  the  scheduling  of  industrial  pro¬ 
duction  (see  the  Exercises).  Formalizations  for  the  situation-space 
paradigm  are  discussed  in  later  sections  of  this  chapter.  It  should  be 
emphasized  that  many  of  the  techniques  being  developed  for  the  solu¬ 
tion  of  state-space  problems  are  directly  applicable  to  situation-space 
problems.  Thus,  “games  of  strategy”  are  one  general  class  of  situation- 
space  problems;  Chapter  4  shows  how  the  methods  discussed  in  this 
chapter  can  be  extended  to  game  playing.  The  state-space  problems 
considered  in  this  chapter  are  essentially  “one-person  games  of  strategy”; 
these  problems  are  also  commonly  called 

An  example  of  a  puzzle  that  is  easily  stated  within  the  {S,F,G) 
format,  yet  for  which  solutions  are  difficult  to  find,  is  the  famous 
“1 5-Puzzle”  (note  3-5).  The  puzzle  uses  a  square  tray  adequate  to 
hold  16  square  tiles,  in  which  15  tiles  are  placed,  each  marked  with 
a  different  number  from  1  to  15.  The  space  for  the  sixteenith  tile  is  left 
empty;  one  configuration  of  the  tiles  may  be  changed  into  another 
configuration  only  by  sliding  a  tile  adjacent  to  the  blank  space  into  the 
blank  space  (this,  of  course,  moves  the  blank  space  in  the  opposite 
direction).  A  “15-Puzzle  Problem,”  or  15-Problem,  is  completely 
stated  when  we  specify  an  initial  configuration  of  the  tiles  and  a  goal 
configuration.  Figure  3-5  shows  a  typical  15-Problem. 
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We  can  state  a  15-Problem  as  an  (S,F,G)  state-space  problem  as 
follows:  A  given  configuration  of  tiles  is  a  state.  We  shall  denote  each 
possible  state  by  a  4  x  4  matrix,  whose  elements  have  values  from  0  to 
15.  Thus,  the  start  and  goal  states  of  the  problem  are  indicated  in  Fig. 
3-5.  For  a  given  state  5  we  denote  the  number  in  the  /th  row  and 
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Figure  3-5.  A  15-PuzzIe. 

the  /th  column  of  its  matrix  by  the  expression  Sij.  Thus,  .y2,3  —  3  for 
the  start-state  being  considered.  For  a  given  state  s,  let  /q  and  jo  be  the 
values  of  i  and  /  such  that  Sij  =  0.  We  have  k  =  3  and  jo  =  2  for  the 
start  state.  Given  this  notation,  we  can  describe  four  operators: 

A.  Replace  by  and  by  0,  if 70  +  1  <  4. 

B.  Replace  Si^j,  by  and  by  0,  if  /o  +  1  <  4. 

C.  Replace  by  and  by  0,  ify'o  ~  1  >  1. 

D.  Replace  by  5^0-1  .io  and  by  0,  if  k  -  1  >  1. 

These  correspond  to  moving  the  blank  space  “right,”  “down,”  “left,” 
and  “up,”  respectively.  As  is  indicated  in  the  description  of  the  opera¬ 
tors,  an  operator  may  not  be  applicable  to  a  given  state.  However,  for 
every  state,  at  least  two  operators  will  be  applicable.  Part  of  the  state 
space  for  the  problem  shown  in  Fig.  3-5  is  shown  in  Fig.  3-6. 

Altogether,  there  are  16!  =  20,922,789,888,000  different  states 
in  the  state  space  of  the  15-Puzzle.  However,  from  any  given  starting 
state,  only  half  of  these  states  can  be  reached,  using  the  operators  A,B,C, 
and  D.  The  other  10^^  trillion  cannot  be  reached,  regardless  of  the  se¬ 
quence  of  moves  one  tries  (see  Fig.  3— 7) .  Computer  programs  have  been 
written  which  are  capable  of  solving  the  15-Puzzle,  that  is,  of  finding  a 
path  between  arbitrary  start  and  goal  states  when  such  a  path  is  possible 
and  of  recognizing  start-  and  goal-state  pairs  for  which  there  is  no  such 
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Figure  3-6.  Part  of  the  state-space  for  the  15-Puzzle. 


Figure  3-7.  Ah  unsolvable  15-Puzzle. 


path.  (One  such  program  is  discussed  below.)  However,  so  far  as  the 
author  knows,  no  “general”  problem-solving  program  (such  as  gps,  ref- 
ARF,  and  FDS  discussed  above)  yet  written  is  capable  of  solving  the  15- 
Puzzle:  Programs  that  can  currently  solve  the  15-Puzzle  are  “special  pur¬ 
pose.”  We  shall  return  to  this  point  later. 
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Problem  Reduction  and  Graphs 

For  the  discussions  that  follow,  and  for  the  reader  who  wishes  to 
do  more  investigation  on  his  own,  it  is  helpful  to  introduce  a  special 
terminology:  The  diagram  shown  in  Figs.  3-3,  3-4,  and  3-6  represent 
what  mathematicians  call  graphs  (note  3-6).  A  graph  is  a  (possibly 
infinite)  collection  of  nodes  and  arcs;  so  far,  we  have  corresponded 
nodes  to  states  and  arcs  to  operators.  Arcs  are  usually  drawn  as 
directed  lines,  or  arrows.  If  an  arc  leaves  one  node,  say  A,  and  enters 
another,  say  B,  we  say  A  is  a  parent  of  B  and  5  is  a  successor  of  A, 
If  it  is  necessary  to  be  more  explicit,  we  often  say  5  is  a  successor  of  A 
“under  the  operator  q”  etc.  It  A  and  B  are  successors  of  each  other, 
we  often  replace  the  two  arcs  between  them  by  a  single  edge,  drawn 
either  as  a  line  segment  or  as  a  two-headed  arrow.  If  a  node  has  no 
successors,  it  is  said  to  be  terminal,  A  sequence  of  arcs  and  nodes  lead¬ 
ing  from  a  given  node  ^  to  a  given  node  B  is  called  a  path  from  A  to 
B,  If  A  and  B  are  connected  by  a  path,  we  say  A  is  an  ancestor  of  B 
and  5  is  a  descendant  of  A,  Thus,  in  Fig.  3-2,  TTH  is  both  an  ancestor 
and  a  descendant  of  THT. 

It  should  be  evident  from  these  definitions  that  an  {S,F,G)  prob¬ 
lem  essentially  involves  finding  paths  between  prespecified  nodes  in  a 
graph.  The  nodes  in  the  graph  correspond  to  states  in  the  state  space, 
and  the  edges  (or  arcs,  or  connections)  between  nodes  correspond  to 
the  application  of  operators.  We  often  refer  to  the  state  space  of  a  state- 
space  problem  as  a  state-space  graph,  and  use  the  words  “node”  and 
“state”  interchangeably. 

For  some  state-space  problems  the  state-space  graph  may  be  so 
small  that  it  can  be  defined  explicitly  and  shown  in  a  picture  (e.g., 
Fig.  3-4);  in  other  cases  the  graph  may  be  so  huge  that  it  can  be 
defined  only  implicitly,  and  we  can  draw  only  pictures  of  very  small 
portions  of  it— such  was  the  case  with  the  15 -Puzzle.  In  most  problems 
that  have  been  investigated  by  ai  researchers,  solutions  can  be  ex¬ 
plicitly  indicated  once  they  have  been  found;  that  is,  one  can  usually 
either  draw  a  diagram  of  the  solution  or  state  the  solution  by  listing  a 
series  of  symbols,  each  standing  for  a  particular  operator,  as  we  did 
with  the  Three  Coins  Problem.  However,  in  some  problems  even  the 
solutions  involve  huge  graphs  and  must  be  stated  implicitly;  some  prob¬ 
lems  of  this  sort  are  games  such  as  Checkers  and  Chess. 

One  of  the  most  useful  aspects  of  the  state-space  paradigm  is  that 
it  can,  in  a  sense,  be  applied  to  itself.  Instead  of  identifying  nodes  by 
states  and  arcs  by  operators,  we  can  identify  nodes  by  problems  and 
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arcs  by  operators  that  change  problems  into  other  problems.  We  refer 
to  finding  a  good  path  through  a  graph  of  this  sort  as  a  problem-reduc¬ 
tion  problem.  The  graph  of  a  problem-reduction  problem  is  known  as 
an  and/or  graph,  for  reasons  we  shall  learn  in  a  moment. 

Problem-reduction  problems  can  be  developed  as  a  natural  exten¬ 
sion  of  (5’,F,G)  state-space  problems.  To  see  how,  consider  the  simplest 
possible  (S,F,G)  state-space  problem:  What  would  it  be? 

Well,  there  are  many  extremely  simple  (5>F,G)  problems,  but  all 
are  approximately  of  the  same  form.  We  can  classify  three  types  of 
trivial  or  primitive  (5,F,G)  problems: 

1.  Problems  of  the  form  (S,  {q},  G),  where  there  is  only  one 
operator  available. 

2.  Problems  of  the  form  {{s},  F,  {j}),  in  which  no  operator 
need  be  applied — more  generally,  problems  of  the  form 
(S,F,G)  where  5  Pi  G  ^  that  is,  in  which  some  start  state 
is  also  a  goal  state. 

3.  Problems  of  the  form  (S,  {  },  G)  in  which  no  operator  can 
be  applied  and  there  is  no  start  state  that  is  also  a  goal  state. 
The  first  two  types  of  problem  are  trivially  solvable;  the  last 
is  trivially  unsolvable. 

Basically,  the  problem-reduction  approach  consists  of  finding 
operators  that  are  capable  of  transforming  complex  (S,F,G)  problems 
into  primitive  (S,F,G)  problems.  The  particular  operators  one  uses  will 
depend  upon  the  initial,  complex  (5,F,G)  problem,  and  it  may  often 
be  very  difficult  to  find  good  problem-reduction  operators. 

In  general,  an  and/or  graph  contains  two  types  of  nodes:  problem 
nodes  and  AND-nodes.  These  nodes  are  usually  called  subgoals  when  it 
is  not  necessary  to  distinguish  them.  The  arcs  connecting  problem  nodes 
to  problem  nodes,  and  problem  nodes  to  AND-nodes,  represent  the 
application  of  problem-reduction  operators.  Those  connecting  and- 
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nodes  to  problem-nodes  will  be  referred  to  as  and-links.  The  and-lmks 
from  an  AND-node  usually  subtend  a  circular  arc,  as  shown  here. 

A  typical,  small  and/or  graph  is  shown  in  Fig.  3-8.  A  problem 
node  is  said  to  be  solvable  if  it  is  trivially  solvable,  or  if  any  of  its^uc- 
cessor  nodes  is  solvable.  On  the  other  hand,  a  problem-node  is  un- 
solvable  if  it  is  trivially  unsolvable,  or  if  all  its  successor  nodes  are 
unsolvable.  An  AND-node  is  unsolvable  if  at  least  one  of  its  successor 
nodes  is  unsolvable;  otherwise  it  is  solvable. 

Good  examples  of  the  problem-reduction  approach  are  the  “sym¬ 
bolic  integration”  problem  solvers,  such  as  saint  (by  Slagle,  1963)  and 


Figure  3-8.  A  small  AND/OR  graph. 
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SIN  (Moses,  1967).  These  programs  are  capable  of  evaluating  integrals 
such  as 

/(I  - 

in  a  symbolic  fashion  similar  (especially  in  the  case  of  saint)  to  the 
way  in  which  people  go  about  solving  such  problems. 

SAINT  was  constructed  to  use  a  table  of  trivial  integral  forms, 
such  as 

(n  >  1) 

J  «  +  1  ” 

J sin  u  du  =  cos  u 

and  it  was  given  problem-reduction  operators  corresponding  to  various 
rules  for  the  transformation  of  integrals,  such  as  the  integration-by¬ 
parts  rule,  the  sum-decomposition  rule,  and  certain  trigonometric  and 
algebraic  substitution  rules.  For  our  purpose,  it  is  not  necessary  to 
understand  these  rules  or  integral  calculus. 

When  SAINT  was  given  an  expression  like  (3-1),  it  would  attempt 
to  reduce  the  expression  to  a  combination  of  the  trivial  integrals  in  its 
table,  by  the  proper  application  of  its  problem-reduction  operators.  In 
most  cases  its  success  at  doing  integration  problems  in  this  way  was 
at  about  the  level  of  a  good  first-year  calculus  student. 

Figure  3-9  shows  a  portion  of  the  and/or  graph  constructed  by 
SAINT  in  its  solution  for  the  problem  expression  (1).  The  top  part  of 
the  graph  is  similar  to  Figs.  3-3,  3-4,  and  3—6.  The  trivial  integral  forms, 
or  primitive  problems,  are  the  dark-bordered  square  nodes  at  the 
bottom  of  the  figure.  The  first  operator  applied  by  saint  was  “trigono¬ 
metric  substitution”;  this  transformed  the  start  node  into  the  problem  at 
node  ^  in  the  figure.  Then  saint  applied  two  operators  and  obtained 
node  B.  Since  saint  estimated  B  as  being  a  difficult  problem,  the  pro¬ 
gram  went  back  to  A,  applied  another  reduction  operator,  and  obtained 
node  C.  But  C  also  looked  difficult,  so  saint  went  back  to  A  and  ap¬ 
plied  a  sequence  of  three  operators,  labeled  “trigonometric  identity,” 
“trigonometric  substitution,”  “algebraic  identity,”  and  obtained  node 
D.  Then  saint  applied  the  reduction  operator  “sum-decomposition,” 
which  transformed  D  into  three  problems  (E,F,  and  G)  linked  together 
by  an  AND-node;  the  AND-node  between  node  D  and  nodes  E,F,  and  G 
means  that  D  can  be  solved  if  E,F,  and  G  can  all  be  solved.  Since  F 
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Start 


/ 


Figure  3-9.  A  portion  of  SAINT’S  AND/OR  graph  for  an  Integration 
problem.  (Adapted  with  permission  from  Nilsson,  1971) 
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turned  out  to  be  primitive,  or  trivially  solvable,  E  and  G  were  quickly 
reduced  to  primitive  problems. 

Thus,  SAINT  deduced  the  following  facts: 

1.  Start  can  be  solved  if  A  can  be  solved. 

2.  A  can  be  solved  if  B,  or  C,  or  D  can  be  solved. 

3.  D  can  be  solved  if  E  and  F  and  G  can  be  solved. 

4.  E,F,  and  G  can  all  be  solved. 

5.  Start  can  be  solved. 

Having  proved  that  its  initial  problem  could  be  solved,  saint  was 
able  to  construct  the  actual  solution 


/ 


(1  - 


dx  =  arcsin  Z  +  ~  tan®(arcsin  x)  —  tan(arcsin  xf 


(3-2) 


by  first  solving  E,F,  and  G  and  then  undoing  the  sequence  of  substitu¬ 
tions  it  had  used  in  going  from  start  to  A  to  D.  SAINT  required  about 
1 1  minutes  to  solve  this  problem. 

Notice  that  nothing  has  been  said  about  how  saint  “estimated’’ 
the  difficulty  of  problems.  This  subject  is  left  for  the  next  section;  for 
the  moment,  we  have  concentrated  on  the  nature  of  problem-reduction 
problems  and  and/or  graphs. 

The  sin  program  by  Moses  is  more  sophisticated  than  saint. 
SIN  itself  might  be  said  to  constitute  a  single  reduction  operator,  which 
in  most  cases  is  capable  of  going  directly  from  problem  to  solution 
without  generating  an  and/or  graph.  SIN  is  capable  of  solving  integra¬ 
tion  problems  “at  the  difficulty  approaching  those  in  the  larger  integral 
tables”  (Moses,  1967).  For  example,  sin  can  evaluate  problem  (3-1) 
in  about  9  seconds  (note  3-7);  in  doing  so,  it  generates  only  two  sub¬ 
goals  in  contrast  to  the  13  required  by  saint  for  the  same  problem. 


Summary 

We  have  seen  two  ways  of  stating  problems  that,  in  effect,  ask  the 
problem  solver  to  find  paths  connecting  prespecified  sets  of  nodes  in 
graphs.  In  many  cases  (including  problem-reduction  problems)  the 
relevant  graphs  may  be  too  large  to  store  or  generate  completely  by 
computer;  this  is  probably  true  for  most  of  the  important  problems  a 
mechanical  intelligence  might  be  called  upon  to  solve.  How  is  it  pos¬ 
sible  to  solve  problems  of  this  sort?  What  enables  a  computer  to  avoid 
generating  10  trillion  states  of  the  15-Puzzle,  and  yet  still  solve  the 
problem? 
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HEURISTIC  SEARCH  THEORY 
Need  for  Search 

The  field  of  ai  research  concerned  with  ways  that  computers  can 
solve  large  state-space  problems  is  known  as  heuristic  search  theory.  In 
this  section  we  introduce  the  reader  to  some  of  the  central  concepts  of 
this  field;  more  thorough  discussions  are  provided  by  Nilsson  (1971), 
Banerji  (1969),  Pohl  (1970),  andMichie  (1971). 

Given  a  finite  description  of  a  state-space  problem,  a  computer 
can  be  programmed  to  generate  the  state  space  of  the  problem  while 
checking  to  see  if  it  has  produced  a  solution.  The  generation  process 
consists  simply  of  producing  finite  descriptions  (data-structures)  for 
the  nodes  of  the  state  space  and  for  their  connections  to  each  other. 
With  a  large,  difficult  state-space  problem,  it  will  not  be  possible  for 
the  computer  to  generate  descriptions  for  each  of  the  nodes  and  connec¬ 
tions  between  nodes  of  the  state  space  of  that  problem.  Rather,  the 
computer  may  generate  only  a  relatively  small  portion  of  that  state  space, 
and  can  check  only  that  portion,  to  see  whether  it  includes  a  path  be¬ 
tween  nodes,  which  is  a  solution  to  the  problem.  With  suitable  program¬ 
ming,  the  computer  may  generate  a  portion  of  the  state  space  containing 
on  the  order  of  10^  nodes  (for  some  problems  it  may  be  necessary  and 
possible  to  generate  a  few  orders  of  magnitude  more;  conversely,  the 
“general  problem  solvers”  discussed  in  this  chapter  typically  may  gen¬ 
erate  no  more  than  100  nodes),  whereas  the  state  space  of  a  difficult 
problem  may  easily  contain  W  nodes.  Thus,  it  is  clear  that  the  com¬ 
puter  must  be  somewhat  selective  in  the  way  that  it  generates  the  por¬ 
tion  of  a  state  space  that  it  produces  when  trying  to  solve  a  state-space 
problem,  if  it  is  to  be  successful.  Any  procedure  that  a  computer  uses 
to  generate  a  portion  of  the  state  space  for  a  problem,  and  to  check 
that  portion  for  a  solution,  is  said  to  search  the  state  space,  and  is  called 
a  “search  procedure.”  In  this  section  we  are  interested  in  ways  that 
search  procedures  can  be  designed  to  be  “selective”;  that  is,  ways  they 
can  be  successful  at  finding  a  solution  to  a  state-space  problem  without 
generating  the  entire  state  space  of  that  problem. 

Of  course  a  search  procedure  might  find  a  solution  for  a  problem 
simply  by  randomly  generating  descriptions  for  nodes  and  their  inter¬ 
connections,  but  unless  a  large  percentage  of  the  paths  through  the  state 
space  of  a  problem  happen  to  be  solution  paths,  such  a  procedure  will 
not  generally  be  successful.  Usually,  what  we  desire  in  a  search  pro¬ 
cedure  is  that  it  somehow  be  “systematically  oriented”  toward  the  prob¬ 
lem  it  is  being  used  to  solve,  in  such  a  way  that  it  can  find  a  solution 
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without  generating  the  entire  state  space.  A  search  procedure  that  is 
systematically  oriented  toward  a  problem  will  be  said  to  embody 
heuristic  (i.e.,  “serving  to  discover”)  information  and  will  be  called  a 
heuristic  search  procedure.  (Ways  of  achieving  “systematic  orientation” 
are  discussed  below.)  If  we  can  prove  that  a  search  procedure  will  al¬ 
ways  find  a  solution — if  there  is  one — to  any  state-space  problem 
{{s},F,G)  such  that  s  £  S,  then  we  say  the  search  procedure  is  an 
algorithmic  search  procedure  for  the  state-space  problem  (5',F,G).  It 
is  possible  for  a  given  search  procedure  to  be  either  heuristic  or  algo¬ 
rithmic  or  both  or  neither,  with  respect  to  some  state-space  problem 
(5,F,G).  Most  often  the  search  procedures  used  by  problem-solving 
computers  are  heuristic,  but  not  algorithmic;  sometimes  they  are  both 
(thus,  a  symbolic  integration  program  using  the  Risch  algorithm  (note 
3“7)  would  be  heuristic  and  algorithmic,  according  to  our  definitions). 
Again,  for  large  state-space  problems,  there  is  little  value  to  a  search 
procedure  that  is  algorithmic  but  not  heuristic,  one  that  would  solve 
the  problem  but  might  have  to  search  the  entire  state  space  to  do  so. 

Thus,  “heuristic  programming”  refers  to  computer  programs  that 
employ  procedures  not  necessarily  proved  to  be  correct,  but  which  seem 
to  be  plausible.  Most  problems  that  have  been  considered  by  Ai  re¬ 
searchers  are  of  the  sort  where  no  one  knows  any  practical,  completely 
correct  procedures  to  solve  them;  therefore,  a  certain  amount  of  pro¬ 
ficiency  in  using  hunches  and  partially  verified  search  procedures  is 
necessary  to  design  programs  that  can  solve  them.  So,  by  a  heuristic  is 
meant  some  rule  of  thumb  that  usually  reduces  the  work  required  to 
obtain  a  solution  to  a  problem.  (Again,  it  may  be  possible  to  prove  that 
the  heuristic  will  always  supply  solutions  to  some  set  of  problems,  i.e., 
that  it  is  algorithmic.)  Clearly,  much  of  the  conscious  thinking  that 
people  do  is  based  upon  the  use  of  heuristics  that  have  not:  been  shown 
to  be  algorithms.^  The  realization  of  this  fact  and  its  incorporation  in 
the  design  of  computer  programs  was  an  important  step  in  the  develop¬ 
ment  of  artificial  intelligence,  signifying  a  recognition  by  ai  researchers 
that  intelligence  is  often  exhibited  in  situations  where  one’s  understand¬ 
ing  and  knowledge  are  incomplete. 

Search  Procedures 

There  are  basically  two  methods  of  incorporating  heuristic  informa¬ 
tion  about  (i.e.,  “systematic  orientation”  toward)  a  state-space  problem 
into  a  search  procedure  designed  to  solve  that  problem;  these  methods 

^  We  might  have  a  hard  time  proving  this  to  a  strict  behaviorist.  This  is  one 
of  the  places  where  the  author  invokes  his  “personalistic  license,”  granted  in  note 
1-2. 
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correspond  to  the  use  of  “generator  functions”  and  “evaluation  func¬ 
tions.”  Our  description  of  these  methods  will  be  facilitated  if  we  ex¬ 
amine  in  a  little  more  detail  the  generation  processes  that  may  be  used 
by  search  procedures. 

The  generation  processes  that  ai  researchers  have  investigated  for 
state-space  problems  are  made  up  of  the  following  basic  steps:  First, 
a  start  node  s  is  given  to  the  search  procedure.  This  node  corresponds  to 
a  finite  description  of  a  state  and  is  stored  by  the  computer  as  a  data- 
structure. 

Next,  using  the  operators  (in  the  set  F  of  the  state-space  problem), 
the  successors  to  the  start  node  are  generated  (i.e.,  a  finite-description 
for  each  successor  is  generated).  We  denote  by  T  a  procedure  that 
calculates  all  successors  to  a  given  node.  The  process  of  applying  r  to 
a  node  is  known  as  expanding  the  node,  or  generating  the  successors 
to  the  node;  thus,  r  is  often  referred  to  as  a  generator  function,  or 
generator. 

After  a  node  is  expanded,  pointers  arc  set  up,  leading  back  to  the 
node  from  each  of  its  successors.  If  a  goal  node  is  ever  generated,  then 
there  will  be  pointers  indicating  a  path  from  it  back  to  the  start  node. 

The  successor  nodes  produced  when  a  node  is  expanded  are 
checked  to  see  if  one  of  them  is  a  goal  node.  If  no  goal  node  is  found, 
then  the  process  of  expanding  nodes  and  setting  up  pointers  is  con¬ 
tinued  by  expanding  nodes  that  have  been  generated  as  successors.  If  a 
goal  node  is  found,  then  the  pointers  that  have  been  set  up  are  used  to 
trace  a  path  back  to  the  start  node — the  operators  that  were  originally 
used  by  r  to  produce  the  nodes  along  this  path  may  be  recovered  and 
used  to  produce  a  solution  path. 

The  various  search  procedures  developd  for  solving  state-space 
problems  may  be  distinguished  from  each  other  on  the  basis  of  two 
criteria:  how  the  process  of  expanding  nodes  and  setting  up  pointers  is 
continued,  and  the  nature  of  their  generator  functions.  A  search  pro¬ 
cedure  that  expands  nodes  in  the  order  in  which  they  are  generated, 
after  generating  all  of  them  below  a  given  node,  is  called  a  breadth-first 
search  procedure.  A  search  procedure  that  always  expands  the  most 
recently  generated  node  first  is  called  a  depth-first  search  procedure. 
Figures  3-10  and  3-11  show  “snapshots”  of  the  successive  portions 
of  a  state-space  graph  that  would  be  generated  by  breadth-first  and 
depth-first  search  procedures.  Both  types  of  search  procedure  are  ex¬ 
amples  of  blind  search  procedures  because  the  order  in  which  they  ex¬ 
pand  nodes  is  unrelated  to  the  actual  location  of  goal  nodes  in  the  state 
space  (unless  their  generator  functions  incorporate  heuristic  informa¬ 
tion;  see  below).  Thus,  they  are  not  heuristic  search  procedures. 
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A.  Expansion  of  the  start  node. 


C.  D. 


Figure  3-10.  Snapshots  of  the  search  produced  by  breadth-first  proce¬ 
dure.  Dotted  circles,  ungenerated  nodes;  solid  circles,  generated  nodes. 

The  breadth-first  search  procedure  is  algorithmic:  If  a  path  does 
exist  from  a  given  start  node  to  a  goal  node,  it  will  eventually  be  pro¬ 
duced,  using  this  procedure.  It  is  possible  for  the  depth-first  procedure 
to  search  forever,  going  off  in  the  wrong  direction,  without  finding  a 
solution  path,  even  though  one  might  exist.  So,  the  depth-first  procedure 
as  stated  here  is  not  algorithmic.  However,  it  may  be  modified  to  an 
algorithmic  procedure  by  introducing  the  concept  of  the  “depth”  of  a 
node  (relative  to  the  given  start  node) :  The  depth  of  a  node  is  zero  if 
it  is  the  start  node,  and  is  one  plus  the  depth  of  its  parent  otherwise.  A 
bounded  depth-firsi  search  procedure  is  one  which  expands  that  previ¬ 
ously  generated,  unexpanded  node  which  has  the  greatest  depth  less 
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A.  Expansion  of  the  start  node.  B. 


Figure  3~11.  Snapshots  of  a  search  produced  by  a  depth-first  proce¬ 
dure.  Dotted  circles,  ungenerated  nodes;  solid  circles,  generated  nodes. 

than  the  depth  {or  level)  bound  I  established  for  the  procedure.  (If 
there  is  more  than  one  such  node,  it  expands  the  one  most  recently 
generated.)  As  illustrated  by  the  snapshots  in  Fig.  3-12,  a  bounded 
depth-first  procedure  generates  nodes  in  a  depth-first  manner  until  it 
reaches  its  depth  bound;  it  then  “backs  up”  and  generates  more  nodes 
in  a  different  direction,  etc.  It  is  fairly  simple  to  see  how  this  idea  may 
be  extended  (essentially  by  allowing  the  depth  bound  to  be  systemati¬ 
cally  increased)  to  produce  an  algorithmic  search  procedure  with  a 
basically  “depth-first”  nature. 

Most  heuristic  search  procedures  are,  in  effect,  modifications  of  the 
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A.  Expansion  of  the  start  node. 


Figure  3-12.  Snapshots  of  a  search  produced  by  a  bounded  depth-first 
procedure.  Dotted  circles,  ungenerated  nodes;  solid  circles,  generated 

nodes. 


bounded  depth-first  search  procedure.  As  explained  initially,  heuristic 
search  procedures  rely  on  two  methods,  the  use  of  generator  functions 
and  the  use  of  “evaluation  functions.”  A  generator  function  may  in¬ 
corporate  heuristic  information  about  a  problem  if  it  is  designed  to 
generate  first  those  successors  of  the  node  to  which  it  is  applied 
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which  are  most  likely  to  lie  on  (preferably  short)  paths  to  some  goal 
node.  A  search  procedure  that  uses  such  a  generator  will  tend  to  be 
“guided”  toward  a  solution  if  it  expands  nodes  according  to  the  order 
in  which  they  are  produced  by  its  generator.  Thus,  such  a  search  pro¬ 
cedure  is  “systematically  oriented”  by  its  generator  toward  searching 
the  most  promising  portions  of  the  state  space  first. 

An  evaluation  function  is  some  procedure  that  can  be  applied  to 
the  finite  description  of  a  node  in  a  state-space  problem  and  which  will 
produce  an  estimate  of  the  “value”  of  that  node  (the  likelihood  that  the 
node  lies  on  a  path  to  a  goal  node).  AI  researchers  have  investigated  a 
variety  of  different  kinds  of  evaluation  functions:  For  example,  Slagle 
and  Bursky  (1968)  designed  an  evaluation  function  that  estimated  the 
probability  that  a  given  node  would  be  on  a  path  to  a  goal  node;  Quinlan 
and  Hunt  (1968)  used  an  evaluation  function  that  constructed  a  differ¬ 
ence  set,  measuring  the  (structural)  differences  between  an  arbitrary 
node  and  a  given  goal  node;  Samuel  (1959,  1967)  used  an  evaluation 
function  that  examined  the  important  “features”  possessed  by  a  board 
configuration  in  checkers,  to  produce  an  estimate  of  the  “strategic 
value”  of  the  configuration  (see  Chapter  4). 

The  central  results  in  heuristic  search  theory  are  those  of  Hart, 
Nilsson,  and  Raphael  (1968)  and  Pohl  (1970).  Their  results  hold  for 
evaluation  functions  that  produce  numerical  estimates  for  the  “values” 
of  nodes  in  state  spaces.  By  convention,  if  /  is  an  evaluation  function, 
and  n  and  n'  are  nodes  in  a  state  space,  then  we  say  that  n  is  more 
valuable  than  n'  if  f(n)  <  f(n');  the  lower  the  number  assigned  to  a 
node  by  the  evaluation  function,  the  greater  is  the  “value”  of  that  node. 
An  ordered  search  procedure  using  the  evaluation  function  /  is  a  search 
procedure  that  expands  the  previously  generated,  unexpanded  node  n 
for  which  f{n)  is  a  minimum;  if  there  is  more  than  one  such  node  n, 
then  it  expands  the  most  recently  generated  one.  The  Hart-Nilsson- 
Raphael  result  may  be  stated  as  follows:  For  a  given  state-space  prob¬ 
lem  ({.y},F,G),  let  g{n)  be  the  depth  of  node  n  from  the  start  node;  let 
h{n)  be  an  estimate  of  the  length  of  the  shortest  path  from  n  to  a  goal 
node  of  the  state  space,  and  let  hp{n)  be  the  actual  length  of  the  shortest 
path  from  n  to  a  goal  node.  If  for  any  node  «  we  have  h{n)  :^hp(n), 
then  an  ordered  search  procedure  using  the  evaluation  function  f(x)  = 
g(x)  +  h(x)  will  always  find  the  shortest  solution  path  for  the  state- 
space  problem  ({s},F,G),  if  there  is  a  solution  path  at  all.  Furthermore, 
provided  h(x)  is  generally  greater  than  zero,  the  ordered  search  pro¬ 
cedure  will  often  need  to  expand  fewer  nodes  to  produce  its  solution 
than  would  the  breadth-first  search  procedure.  Thus,  such  an  ordered 
search  procedure  is  both  algorithmic  and  heuristic.  If  we  relax  the 
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condition  that  h(x)  ^hp(x),  the  ordered  search  procedure  will  still  be 
heuristic,  but  it  may  not  be  algorithmic.  (More  specific  information 
about  h  would  be  needed  to  determine  whether  it  is  algorithmic.) 

Search  T  rees 

Actually,  the  presentation  of  heuristic  search  theory  thus  far  is 
precisely  correct  only  for  problems  whose  state-space  graphs  have  the 
nature  of  a  “tree.”  A  tree  is  a  graph  with  the  following  characteristics: 

1 .  The  tree  contains  exactly  one  node  that  does  not  have  a 
parent;  this  node  is  called  the  root  node, 

2.  Every  other  node  in  the  tree  is  a  descendant  of  the  root  node. 

3.  Every  other  node  in  the  tree  has  exactly  one  parent. 

If  the  graph  for  a  state-space  problem  ({^},F,G)  is  a  tree,  then 
the  start  node  s  will,  of  course,  be  the  root  node  of  the  tree  :  A  tree  that 
is  a  graph  for  a  state-space  problem  is  often  called  a  problem  tree  lor 
that  problem.  Figure  3-4  shows  a  portion  of  the  problem  tree  for  the 
Three  Coins  Problem.®  A  basic  modification  is  needed  to  make  an 
ordered  search  procedure,  using  the  evaluation  function  /(a:)  =  g(:r)  + 
h(x)y  produce  an  optimal  solution  (in  the  sense  of  the  preceding  para¬ 
graph)  when  searching  a  state-space  graph  that  is  not  a  tree.  The  modifi- 
^cation  consists  of  providing  the  procedure  with  a  means  of  “updating”  its 
function  g(^) ;  a  node  in  a  (general,  non-treelike)  graph  may  have  more 
than  one  parent.  Thus,  we  should  define  the  “depth”  of  a  node  n  in  a 
graph  to  be  zero  when  it  is  the  start  node  s;  otherwise,  we  should  define 
it  as  one  plus  the  depth  of  its  shallowest  parent.  The  ordered  search 
procedure  may  generate  a  node  n  when  expanding  a  node  n'  with  a 
depth  of  d'  and  later  generate  the  node  n  again  when  expanding  a  node 
n"  with  a  depth  d".  If  d"  <  i/',  then  the  procedure  should  change  its 
estimate  for  the  depth  of  n,  from  r/'  +  1  to  +  1  (and  it  shouW  make 
a  similar  change  in  its  estimate  of  the  depths  of  those  nodes  that  are 
descendants  of  n) . 

Pohl  (1970)  presented  similar  results  for  an  ordered  search  pro¬ 
cedure  using  the  evaluation  function  f(x)  =  (1  —  (o)g(;iL:)  -f-  o)h(x) 
where  <o  is  an  adjustable  parameter.  (Pohl  also  discusses  bidirectional 
search  procedures,  which  are  procedures  that  generate  the  state  space  of 
a  problem  outward  from  both  the  start  and  goal  nodes;  he  concluded 
that  such  procedures  are  much  more  difficult  to  implement  efficiently 

®  Problem  trees  are  normally  drawn  upside  down,  with  the  root  node  at  the 
top  (see  Knuth,  1969a,  p.  307). 
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than  are  the  ordinary  “one-directional”  searches  we  have  discussed.) 

It  is  often  desirable  to  design  the  generator  function  used  by  a 
search  procedure  to  have  a  “memory,”  and  to  generate  the  successors 
to  a  given  node  in  a  one-at-a-time  manner  that  can  be  interrupted  and 
resumed  when  necessary.  If  such  a  generator  function  incorporates 
heuristic  information  (thus  tending  to  generate  first  those  successors  to 
a  given  node  that  have  the  greatest  value),  then  a  search  procedure  that 
uses  it  may  search  the  most  promising  parts  of  the  state  space  first, 
without  needing  to  generate,  store,  or  evaluate  the  less  plausible  nodes. 
And,  if  its  first  searches  of  the  state  space  do  not  succeed  in  producing 
a  goal  node,  the  procedure  may  then  reapply  its  generator  function  and 
search  less  plausible  parts  of  the  state  space.  Michie  (1971)  presented 
search-theoretical  results  for  a  general  problem-solving  program  (gt4), 
which  uses  this  type  of  generator  function. 

The  general  problem-solving  programs  we  have  discussed  in  this 
chapter  (gps,  fds,  multiple,  gt4 — ref-arf  differs  slightly,  as  we  shall 
see  below)  are  all  programs  that  can  accept  a  finite  description  for  an 
arbitrary  (^^F^G)  state-space  problem,  and  which  use  that  description 
to  conduct  a  search  through  a  portion  of  the  state  space  of  that  problem. 
Their  generality  resides  in  the  fact  that  they  can  accept  finite  descrip¬ 
tions  for  many  different  (S,F,G)  problems,  and  can  often  find  solution 
paths  for  those  problems.  To  a  large  extent,  fds  and  multiple  are 
able  to  develop  their  own  evaluation  functions.  The  limitations  of  these 
problem  solvers  are  due  to  two  facts:  They  can  search  only  relatively 
small  state  spaces;  and  they  cannot  develop  a  better  finite  description 
for  a  state-space  problem  than  the  one  they  are  given — that  is,  they 
are  not  representation-independent. 

Although  we  have  not  discussed  applications  of  heuristic  search 
theory  to  problem-reduction  problems,  much  the  same  results  can  be 
obtained.  It  should  be  noted  that  the  procedures  for  searching  and/or 
graphs  are  essentially  the  same  as  those  for  searching  the  state  spaces 
of  two-person,  nonchance,  perfect-information  games  of  strategy.  (An 
AND  node  corresponds  to  a  move  belonging  to  one’s  opponent;  an  or 
node  corresponds  to  a  move  belonging  to  oneself.)  The  discussion  in 
Chapter  4  is  therefore  relevant  to  search  procedures  for  and/or  state- 
space  problems.  However,  for  a  thorough  treatment  of  heuristic  search 
theory  as  it  applies  to  problem-reduction  problems,  and  for  a  much  more 
extensive  discussion  of  the  material  in  this  section,  we  encourage  the 
reader  to  see  the  book  by  Nilsson  (1971). 
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PLANNING,  REASONING  BY  ANALOGY, 

AND  LEARNING 

Planning 

In  its  remaining  two  sections,  this  chapter  discusses  some  important 
aspects  of  current  ai  research  on  problem  solving.  The  topics  discussed 
in  this  section  are  “planning,”  “learning,”  and  “reasoning  by  analogy”*; 
the  next  section  discusses  “models,”  the  “problem  of  problem  repre¬ 
sentation,”  and  the  “levels  of  competence”  that  have  been  attained  by 
machine  intelligences.  These  topics  represent  open  questions  for  Ai  re¬ 
searchers,  rather  than  established  theories.  Space  does  not  permit  presen¬ 
tation  here  in  detail  of  the  many  viewpoints  (and  results!)  that  have 
been  developed  about  these  topics,  although,  because  of  the  interde¬ 
pendence  of  aptitudes,  they  will  be  discussed  in  greater  detail  in  subse¬ 
quent  chapters.  At  this  point,  only  some  brief  summaries  and  references 
to  the  literature  are  presented. 

A  process  that  constructs  and  executes  plans  for  solving  problems 
is  said  to  be  a  planning  process.  As  emphasized  throughout  this  chapter, 
many  problems  are  partially  specified,  and  for  them  there  may  not  exist 
a  single  string  of  actions  or  operators  that  will  always  constitute  a 
solution;  therefore,  the  best,  initial  solution  is  often  a  plan.  Plans  for 
solving  such  problems  may  specify  a  wide  variety  of  different  actions, 
including  “looking  for  outside  help”  and  “making  a  new  plan.”  Further¬ 
more,  these  actions  may  be  conditional;  theit  is,  a  plan  might  include 
statements  of  the  form  “if  X  happens,  then  do  Y;  otherwise  do  Z.”  Or, 
they  may  include  loops  eind  recursion  such  as:  “Stepl.  Put  money  in 
the  jukebox  and  punch-a-button;  if  it  doesn’t  play  what  you  want,  then 
go  to  stepl ;  otherwise,  go  to  step2”;  “If  at  first  you  don’t  succeed  .  . 
“Move  block  (x).  If  x  does  not  support  anything,  then  pick  it  up  and 
move  it>  Otherwise,  for  all  y  such  that  x  supports  y,  first  do  move  block 

(y).” 

Again,  the  state  spaces  of  some  problems  are  extremely  large,  and 
the  shortest  solution-path  for  such  a  problem  naight  be  very  long.  A 
plan  for  such  a  problem  might  specify  subgoals  along  the  solution-path, 
and  instruct  the  computer  to  search  first  for  a  path  to  the  shallowest 
subgoal,  and  then  for  a  path  from  that  subgoal  to  the  next,  etc.  (See 
McCarthy’s  San  Diego  Problem  in  Exercise  3-11.)  This  is  often  re¬ 
ferred  to  as  the  “milepost”  paradigm  for  plans. 

AI  research  on  plans  and  planning  may  be  divided  rather  naturally 
into  three  categories:  paradigms  for  the  concept  of  plan;  computer 
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execution  of  plans;  computer  development  of  plans  (which  is  what  we 
should  properly  call  “planning”).  The  degree  of  success  attained  by  the 
research  into  these  categories  corresponds  roughly  to  the  order  in  which 
they  are  enumerated.  Thus,  a  number  of  paradigms  have  been  developed 
and  many  of  them  can  be  transformed  into  something  that  a  computer 
can  execute,  but  very  little  success  has  (so  far)  been  obtained  in  having 
computers  develop  their  own  plans. 

The  preceding  paragraphs  summarize  roughly  the  characteristics 
that  have  been  proposed  for  “plan”  by  the  various  paradigms  that  have 
been  developed;  these  characteristics  may  be  condensed  into  something 
of  a  formal  definition :  A  plan  is  a  collection  of  procedures  together  with 
specifications  for  when  those  procedures  should  be  used  (i.e.,  “called”). 
Each  of  the  various  paradigms  is  a  formalization  of  this  idea  in  more 
detail  for  a  specific  problem  domain.  Perhaps  the  most  extensive  for¬ 
malization  is  that  provided  by  Hewitt  (1968  et  seq.),  which  is  discussed 
further  in  Chapter  6.  Other  paradigms  for  “plan”  are  explored  by  Doran 
(1970)  and  Michie  (1971). 

Of  course,  there  is  a  “strange  paradox”  here,  because  we  have  used 
the  same  words  (in  essentially  the  same  phrases)  to  talk  about  the  con¬ 
cept  of  “problem.”  Thus,  a  problem  is  a  collection  of  procedures  (oper¬ 
ators)  together  with  specifications  for  how  they  shall  be  used  to  con¬ 
struct  a  state-space  graph,  and  information  as  to  which  paths  in  the 
state  space  are  solution  paths.  The  concepts  of  “problem”  and  “plan” 
may  both  be  formalized  by  reference  to  procedures  and  their  inter¬ 
actions  with  data-structures.  Thus,  one  of  the  paradigms  for  “plan”  is  to 
see  them  as  “nondeterministic”  programs,  whereas  the  ref-arf  general 
problem-solving  program  (Pikes,  1970)  is  designed  to  correspond  prob¬ 
lems  with  such  programs.  For  a  description  and  discussion  of  nonde¬ 
terministic  programs,  see  Manna  (1970b).  (A  similar,  but  less  well 
formalized  paradigm  for  “plan”  corresponds  plans  to  “fuzzy”  programs; 
see  Zadeh,  1968.) 

In  passing,  it  should  be  noted  that  a  program  has  been  written 
which  uses  the  “milepost”  paradigm  for  plans  in  a  procedure  that  evi¬ 
dently  is  capable  of  solving  the  problems  of  the  1 5-Puzzle.  The  program 
was  written  by  Ashok  K.  Chandra  of  Stanford  University  at  the  request 
of  John  McCarthy.  The  “mileposts”  or  subgoals  used  by  the  program 
correspond  to  the  correct  placement  of  the  blocks  of  puzzle  in  succes¬ 
sive  “gnomons”  of  the  tray  (see  Fig.  3-13).  The  program  conducts  an 
ordered  search  through  the  state  space  of  the  15-Puzzle,  attempting 
first  to  correctly  place  the  blocks  in  the  outer  gnomon,  then  the  next 
outer,  etc.  Moreover,  the  search  conducted  by  Chandra’s  program  is 
bidirectional.  At  the  present  author’s  request,  this  program  was  run  on 
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Figure  3-13.  Gnomons. 


a  sample  of  about  300  randomly  generated  15-Problems  (start-node 
plus  goal-node  in  the  state-space),  and  found  a  solution-path  for  each 
problem  that  was  solvable.  Usually,  the  program  required  about  10 
seconds  to  solve  a  given  15-Problem,  and  in  doing  so  expanded  less 
than  a  thousand  nodes  of  the  state-space. 

Reasoning  by  Analogy 

The  importance  of  designing  problem  solvers  with  an  ability  to 
“reason  by  analogy”  has  been  stressed  by  a  number  of  investigators.  A 
number  of  basic  kinds  of  analogies  were  identified  by  Kling  (1971), 
discussed  further  in  Chapter  6^  The  earliest  program  for  “analogical 
reasoning”  was  that  of  Evans  (1963).  Winston  (1970)  presented  an 
elegant  formalization  for  the  concept  of  “analogy”  and  showed  how  the 
results  of  Evans  (1963)  can  be  extended  to  three  dimensions  (“or 
more”;  see  Chapter  5).  Becker  (1969)  discussed  “semantic  analogies,” 
and  Ramani  (1971)  presented  a  program  that  answers  questions  “by 
analogy.” 

Learning 

AI  research  has  developed  many  paradigms  for  the  concept  of 
“learning.”  Learning  for  state-space  problems  may  be  formalized  as  a 
process  that  finds  suitable  evaluation  functions  and  generator  functions 
for  an  ordered  search  procedure.  Samuel  (1959,  1967)  provided  the 
classic  treatment  of  this  type  of  learning,  in  which  the  nature  of  an 
evaluation  function  for  nodes  in  the  game-tree  of  Checkers  is  changed 
by  a  checker-playing  program,'^based  on  its  previous  experience  with 
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the  game  (see  Chapter  4).  Hewitt  (1968  et  seq.)  presented  the  para¬ 
digm  of  functional  abstraction  for  learning,  and  discussed  some  ways  it 
might  be  utilized  by  planner  programs  (see  Chapter  6).  McCarthy 
(1963),  Minsky  (1968)  and  Winograd  (1968  et  seq.)  emphasized  the 
fact  that  much  of  the  learning  done  by  humans  results  from  their  being 
taught  various  procedures  by  other  humans,  and  stressed  the  desirability 
of  incorporating  “communication”  into  a  general  paradigm  for  learning 
(Chapter  7). 

Since  we  may  correspond  learning  to  the  development  of  the  evalu¬ 
ation  and  generator  functions  for  state-space  search  procedures,  and 
since  such  functions  correspond  to  the  heuristic  information  these  pro¬ 
cedures  may  use,  it  is  natural  to  think  of  learning  as  a  process  of 
“heuristic  development”  for  these  search  procedures.  In  a  sense,  a  pro¬ 
gram  that  modifies  the  evaluation  function  used  by  its  search  procedure 
is  developing  its  own  heuristics.  However,  it  should  be  stressed  that  it 
is  difficult  (unless  the  program  is  “self-affecting”;  see  Chapter  8)  to  say 
that  such  a  program  is  really  developing  its  own  heuristics:  The  process 
(program)  by  which  heuristics  are  developed  is  itself  a  heuristic. 


MODELS,  PROBLEM  REPRESENTATIONS, 
AND  LEVELS  OF  COMPETENCE 

Models 

Throughout  this  book,  and  especially  in  Chapter  7,  the  role  of 
“model-making”  in  artificial  intelligence  will  be  emphasized.  In  theorem¬ 
proving  terminology  (Chapter  6),  a  model  is  a  particular  interpretation 
of  a  statement,  or  of  a  set  of  statements.  (A  set  of  statements  may  have 
more  than  one  model.)  Any  statement  that  is  logically  implied  by  a  set 
of  statements  with  a  given  model  will  hold  true  for  that  model,  but  any 
statement  that  does  not  hold  true  for  the  model  cannot  be  logically  im¬ 
plied  by  the  set.  This  fact  may  be  used  as  a  device  for  recognizing  nour 
derivable  statements:  A  particular  instance  of  a  candidate  statement 
may  be  compared  against  a  model;  if  it  is  found  to  be  false,  then  we 
know  the  candidate  statement  cannot  be  derived  from  the  set  of  state¬ 
ments  for  which  our  model  holds.  A  theorem-proving  procedure  may 
therefore  be  designed  to  reject  automatically  certain®  statements  it  can¬ 
not  hope  to  derive,  if  it  has  a  means  of  developing  models  and  using 
them  for  comparisons.  Rather  than  statements,  we  may  think  of  this 


.  .  .  but  not  all,  thanks  to  Godel  (1931). 


Problem  solving 


107 


process  as  discarding  possible  successor  nodes  of  a  given  node  in  a 
problem-reduction  problem  (see  Nilsson,  1971).  Gelernter  (1959)  pre¬ 
sented  a  landmark  testing  of  this  concept  in  a  program  for  proving 
theorems  about  plane  geometry.  For  his  program  the  use  of  models 
enabled,  on  the  average,  all  but  5  out  of  1,000  of  the  successors  to  a 
given  node  to  be  rejected. 


The  Problem  of  Problem  Representation 

We  have  emphasized  the  desirability  that  general  problem  solvers 
be  representation-independent;  that  is,  capable  of  developing  their  own 
problem  representations.  No  one  has  yet  succeeded  in  giving  represen¬ 
tation  independency  to  computers.  However,  Amarel  (1968  et  seq.) 
charted  part  of  the  basic  mathematics  necessary  for  such  a  task,  and 
showed  how  a  sequence  of  successively  better  problem  representations 
for  the  Missionaries  and  Cannibals  Problem  (see  the  Exercises)  can  be 
produced,  using  the  concepts  of  macro-operators  and  macrostates.  Be¬ 
cause  of  the  fact  that  problems  can  be  represented  by  programs,  it  is 
possible  to  treat  the  problem  of  problem-representation  from  a  pro¬ 
cedure-oriented  point  of  view  (see  Hewitt,  1968  et  seq.)  in  which  the 
problem  of  developing  (correct,  improved)  problem  representations  is 
equivalent  to  that  of  developing  (correct,  improved)  programs  (see 
Chapter  6).  Again,  we  may  see  it  as  a  problem  of  learning  languages 
for  problem  description. 

Levels  of  Competence 

Currently  it  may  be  said  that  ai  research  has  produced  the  follow¬ 
ing  “skillful”  programs,  which  perform  tasks  with  an  aptitude  that 
people  normally  correspond  to  that  of  a  very  practiced  human  intelli¬ 
gence: 

Samuel’s  Checkers  Player 
Greenblatt’s  Chess  Player 

The  symbolic  integration  programs  of  Slagle,  Moses,  and  Risch 
Feigenbaum’s  dendral^ 

Wasserman’s  Bridge  Bidder 
Chandra’s  1 5  -Program 

^  DENDRAL  is  a  heuristic  program  that  infers  the  structure  of  molecules 
from  their  mass  spectrographs.  Its  performance  compares  favorably  to  that  of 
graduate  chemistry  students  (see  Feigenbaum  et  al.,  1971). 
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(The  list  is  not  exhaustive.)  The  aptitude  of  these  programs  for  their 
tasks  has  been  verified  by  direct  comparison  with  the  abilities  of  humans 
who  are  known  to  be  ‘‘skillfuF’  at  performing  the  same  tasks.  Thus, 
Samuel’s  Checker  Player  is  able  to  outplay  all  but  the  very  best  human 
Checker  players.  More  modest  claims  must  usually  be  made  for  the 
other  “skillful”  programs. 

However,  an  important  point  should  be  noted;  All  these  skillful 
programs  are  highly  specific  to  their  particular  problems.  At  the  moment 
there  are  no  general  problem  solvers,  general  game  players,  etc.,  which 
can  solve  really  difficult  problems  (e.g.,  the  15-Puzzle)  or  play  really 
difficult  games  (e.g.,  Checkers  or  Chess)  with  a  skill  approaching  that 
of  human  intelligence.  And,  it  goes  without  saying,  there  are  no  general 
programs  that  can  learn  to  perform  these  difficult  tasks  skillfully  (note 
3-2). 


NOTES 

3-1.  The  IPL  system  was  developed  in  1956  by  Allen  Newell,  J.  C.  Shaw, 
and  Herbert  Simon;  lisp  was  developed  in  1960  principally  by  John 
McCarthy.  The  importance  of  good  programming  languages  to  the  de¬ 
velopment  of  Ai  research  cannot  be  overestimated;  ai  research  could  not 
really  get  off  the  ground  without  ipl  and  lisp.  It  is  doubtful  that  one  of  the 
most  significant  recent  developnients  (Winograd’s  work  on  natural  lan¬ 
guages)  could  have  been  obtained  without  planner  and  a  similar  pro¬ 
gramming  language  called  programmar.  Programming  languages  are 
discussed  further  in  Chapters  6  and  7.  ' 

3—2,  An  empirically  based  theory  that  has  been  produced  by  many  ai 
researchers  is  that  the  more  general  the  aptitude  possessed  by  a  machine, 
the  less  efficient  is  its  performance  of  the  tasks  that  the  aptitude  enables  it  to 
perform.  Thus,  Newell,  Shaw,  and  Simon  noted  that  their  General  Prob¬ 
lem  Solver  was  less  efficient  at  solving  the  problems  it  could  solve  than 
would  have  been  programs  specifically  designed  for  solving  each  of  those 
problems.  This  relation  between  generality  and  efficiency  has  been  con¬ 
firmed  by  the  other  general  problem  solvers  mentioned  in  this  chapter.  How¬ 
ever,  there  is  room  for  doubting  that  the  relation  is  a  “real”  one;  perhaps  it 
is  possible  to  design  general  problem  solvers  that  can  learn  to  solve  the 
problems  in  a  given  problem-domain  more  and  more  efficiently  (for  in¬ 
stance,  the  ability  of  people  to  learn  to  solve  crossword  puzzles)  and, 
within  a  short  time,  approach  the  efficiency  of  problem  solvers  designed 
specifically  to  solve  the  problems  of  that  domain. 

3-3.  The  principle  is  basically  that  suggested  by  McCarthy  (1956), 
namely,  that  “the  enumeration  of  partial  recursive  functions  should  give  an 
early  place  to  compositions  of  the  functions  which  have  already  appeared.” 
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In  this  early  paper,  McCarthy  suggested  the  system-inference  paradigm, 
using  the  following  argument:  A  problem  should  be  something  that  has 
solutions.  By  a  “well-defined  .problem”  is  meant  one  for  which  there  is  a 
definite  test  that  verifies  whether  a  proposed  solution  is  correct.  If  a  pro¬ 
posed  solution  is  not  correct,  then  the  test  may  either  reject  it  or  not 
terminate,  but  if  the  solution  is  correct,  then  the  test  must  always  verify  it 
in  a  finite,  though  possibly  variable,  time.  Let  us  regard  the  test  as  being 
carried  out  by  a  Turing  machine  T,  which,  given  as  input  a  proposed  solu¬ 
tion  (or  description  of  a  proposed  solution),  will  output  in  a  finite  time  an 
affirmative  symbol  r  if  the  proposed  solution  is  correct.  The  statement  of 
the  problem  then  consists  in  a  description  of  T  and  the  designation  of  r;  a 
solution  to  the  problem  is  any  input  string  i  such  that  T{i)  ~  r.  A  general 
problem  solver  is  a  machine  that,  given  the  description  for  the  mth  Turing 
machine  will  compute  an  i  such  that  T^ii)  =  r—ii  in  fact  there  is  such 
an  /.  Since,  given  m,  one  can  construct  the  description  of  the  mth  Turing 
machine,  a  general  problem  solver  can  be  said  to  compute  a  function  g  on 
two  inputs  (m,r)  such  that  ig{m,r) )  =  r;  g,  of  course,  is  to  be  a  partial 
function,  not  defined  for  all  m  and  r  (see  also,  McLamore,  1968). 

3-4.  An  alternate  statement  for  the  state-space  paradigm  is  given  in 
Sandewall  (1969).  His  formalization  is  briefly  described  here.  To  dis¬ 
tinguish  it  from  the  (5,F,G)  formalization,  let  us  call  it  the  (S\F^,G')  for¬ 
malization. 

The  notion  of  state  is  the  same  as  in  the  text;  that  is,  a  state  is  a  data- 
stfucture.  For  the  sake  of  simplicity,  we  shall  call  a  collection  (or  set)  of 
states  a  part/c/e  (physicists  beware!),  and  say  that  the  states  in  a  particle 
exist.  An  operator  is  a  computational  rule,  which  can  be  applied  to  existing 
states  in  a  particle  to  produce  new  states  that  will  also  exist  in  the  particle. 
An  operator  can  either  change  the  states  to  which  it  is  applied  or  it  can  re¬ 
move  some  (perhaps  all)  of  them  from  the  particle,  or  it  can  simply  add 
new  states  to  the  particle.  A  description  of  an  state-space  problem 

is  the  specification  of  three  things:  S',  an  initial  or  starting  particle;  F',  a  set 
of  operators;  and  G',  a  desired  or  goal  particle.  A  solution  to  an  (S',F',G') 
problem  is  the  specification  of  a  finite  sequence  of  operators  and  of  how  they 
are  to  be  applied  to  S'  and  its  successors  such  that  G'  will  be  produced. 

Sandewall  (1969)  discussed  various  types  of  operators  and  formulated 
a  theory  of  heuristic  search  for  this  type  of  problem.  Also,  he  suggested  that 
the  proper  representation  for  the  possible  ways  of  going  from  one  particle 
to  another  is  in  terms  of  lattices  rather  than  trees  or  graphs. 

3-5.  The  15-Puzzle  was  invented  in  1878  by  Sam  Loyd  (see  Loyd,  1960) 
and  was  extremely  popular  during  its  early  years,  especially  in  Europe. 
Kasner  and  Newman  (1956)  reported  that  employers  were  forced  to  post 
notices  forbidding  their  workers  to  play  the  puzzle  during  working  hours. 
Loyd  and  others  offered  huge  prizes  to  anyone  capable  of  solving  some  of 
the  unsolvable  varieties  of  the  puzzle.  Some  commentators  at  the  time  con¬ 
sidered  the  15-Puzzle  to  be  a  threat  to  society,  attributing  to  it  “untold 
headaches,  neuroses,  and  neuralgias.” 
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3-6.  Equivalently,  we  can  say  a  graph  G  is  an  ordered  pair  {N,R), 
where  is  a  set  of  nodes  x^y,  .  .  . ,  and  is  a  set  of  binary  relations  . . . 
on  the  nodes.  If  r^{x,y)  holds  for  a  given  x  and  y  in  N,  then  we  say  “jc  con¬ 
nects  to  y  under  the  relation  rj.” 

As  used  here,  “graph’’  is  a  generalization  of  one  of  the  most  common 
applications  of  the  word :  graph  of  a  function.  The  graph  of  a  function,  say, 
/■*  -X'  y,  is  a  pictorial  description  of  the  relation  r  such  that  r(jc,y)  is  true 
\fy-f{x). 

3—7.  Much  of  sin’s  time  advantage  is  due  to  the  fact  that  it  was  run  mostly 
in  a  compiled  mode,  whereas  saint  was  run  mostly  in  an  interpretive  mode. 
(For  an  explanation  of  these  terms,  see  Knuth,  1969.)  Moses  estimated 
that  his  program  is  actually  about  three  times  faster  than  Slagle’s. 

More  recently,  Risch  (1969)  developed  an  algorithmic  procedure  for 
solving  a  wide  class  of  symbolic  integration  problems;  Risch’s  procedure 
does  not  need  to  generate  a  problem-tree  and  is  guaranteed  to  always 
produce  correct  solutions  (it  might  thus  be  said  to  display  a  “perfect  apti¬ 
tude”  for  its  problem-domain).  An  introduction  to  the  Risch  algorithm  and 
a  summary  of  the  current  state  of  work  on  symbolic  integration  programs 
was  provided  by  Moses  (1971). 

These  programs,  incidentally,  are  distinct  from  “numerical  integration” 
programs,  which  typically  compute  numerical  approximations  to  the  values 
of  definite  integrals.  Finally,  saint  is  an  acronym  for  “5'ymbolic  /Automatic 
WTegrator,”  and  sin  stands  for  “.Symbolic  /Megrator.” 

EXERCISES 

In  Exercises  3—1  through  3— 10,  and  in  Exercise  3-13,  first  solve  the  problems 
that  are  given.  Next,  make  a  list  of  the  subproblems  you  considered  while  solving 
them.  Discuss  how  a  computer  might  be  programmed  to  solve  each  of  the  given 
problems,  and  how  each  of  the  problems  might  be  represented  to  the  computer. 
If  you  find  a  state-space  representation  for  a  problem,  estimate  the  size  of  the 
state  space  and  try  to  identify  heuristics  and  algorithms  the  machine  could  use 
to  search  it.  If  computer  time  is  available  to  you,  choose  a  problem  and  try  to 
implement  a  computer  program  that  can  solve  it. 

3-1 .  Find  your  way  out  of  the  Maze  of  Dedalus. 


Problem  solving 


111 


3-2.  {The  Missionaries-and-Cannibals  Problem.)  Three  missionaries  and  three 
cannibals  are  all  on  one  bank  of  a  river  they  wish  to  cross.  They  have  a  boat, 
which  will  hold  two  persons,  but  which  can  be  rowed  by  one  if  necessary.  If  the 
cannibals  ever  outnumber  the  missionaries  on  a  given  bank,  all  the  missionaries 
on  that  bank  will  be  eaten.  Otherwise,  both  parties  will  cooperate  peacefully 
toward  crossing  the  river.  How  can  all  the  missionaries  and  cannibals  be  trans¬ 
ported  safely  to  the  other  bank?  (b)  Consider  the  general  case  in  which  there  are 
m  missionaries  and  n  cannibals  (m  >  n),  and  in  which  the  boat  can  hold  p 
persons,  but  requires  at  least  r  persons  to  be  rowed  {p  >  r). 

3-3.  {The  Confusion-of -Patents  Problem.) A  certain  patent  attorney  was 
astonished  when  he  received  the  simultaneous  allowance  of  five  patents,  for  five 
separate  clients,  each  of  whom  lived  in  a  different  city. 

His  astonishment  turned  to  chagrin,  however,  when  he  learned  what  had 
happened  to  the  patents.  They  had  been  received  in  his  office  on  the  same  day, 
but  because  of  an  error  made  by  a  new  clerk,  they  were  sent  out  in  wrong 
envelopes.  Each  client  received  a  patent,  but  not  his  own. 

The  inventor  of  the  steam  shovel  received  the  mousetrap  patent,  while  the 
inventor  of  the  latter  found  in  his  mail  the  papers  that  should  have  gone  to  Mr. 
Green.  Mr.  Blue  received  the  patent  for  the  rumbleseat  awning.  Mr.  Black’s  patent 
was  sent  to  Chicago;  the  patent  that  should  have  gone  was  sent  to  Boston. 

Mr.  Brown  had  the  patent  intended  for  New  York.  Mr  White  had  Mr. 
Brown’s  patent.  The  non-refillable  bottle  patent  was  sent  to  Los  Angeles;  the 
inventor  of  the  bottle  received  the  patent  of  the  Cleveland  client,  while  in 
Cleveland  the  surprised  client  received  a  patent  for  an  antisnore  device. 

Who  should  have  received  what  where? 

3-4.  (  T raveling-Salesman  Problems. ) 

(a)  For  the  map  shown  below,  find  the  shortest  path  that  starts  at  city  A, 
visits  each  of  the  other  cities  only  once,  and  then  returns  to  A. 


*  From  Richard  E.  Fikes,  “REF-ARF:  A  System  for  Solving  Problems 
Stated  as  Procedures.”  Artificial  Intelligence  Journal,  Vol.  1  (1970),  pp.  27-120. 
Reprinted  with  permission. 
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(b)  Find  the  path  from  start  to  finish,  which  passes  once  through  all  the 
nodes  lettered  a  through  «: 


3-5.  (Crypt  Addition.)  Assign  a  decimal  digit  to  each  of  the  letters  in  the  words 
“send,”  “more,”  and  “money,”  such  that  when  the  letters  are  replaced  by  the 
corresponding  digits  the  following  summation  is  true: 

send 
+  more 
money 

No  digit  may  be  assigned  to  more  than  one  letter,  and  leading  zeros  are  not 
allowed  in  the  numbers  formed  by  “send,”  “more,”  and  “money.” 

3-6.  (Water-Jug  Problems.) 

(a)  Given  a  3-gallon  jug  and  a  4-gallon  jug,  how  can  precisely  2  gallons 
be  put  into  the  4-gallon  jug?  There  is  a  sink  nearby,  such  that  either  jug  can  be 
filled  from  the  tap  and  its  contents  can  be  poured  down  the  drain.  Also,  water 
can  be  poured  from  either  jug  into  the  other,  but  the  jugs  themselves  are  the  only 
measuring  devices  available. 

(b)  Given  a  5-galIon  jug  and  an  8-gallon  jug,  how  can  precisely  2  gallons 
be  put  into  the  5-gallon  jug?  The  conditions  for  this  problem  are  the  same  as 
those  in  (a). 
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(c)  Given  a  5-gallon  jug  filled  with  water  and  an  empty  2-gallon  jug,  how 
can  precisely  1  gallon  be  obtained  in  the  2-gallon  jug?  In  this  problem,  water  may 
be  either  discarded  or  poured  from  one  jug  into  another,  but  there  is  no  source 
of  water  other  than  the  initial  5  gallons.  Again,  the  jugs  themselves  are  the  only 
measuring  devices  to  be  used. 

3-7.  {The  Monkey -and-B ananas  Problem.)  A  monkey  is  in  a  room  where  a 
bunch  of  bananas  is  hanging  from  the  ceiling,  too  high  to  reach.  In  the  corner 
of  the  room  is  a  box,  which  is  not  under  the  bananas.  The  box  is  sturdy  enough 
to  support  the  monkey  and  light  enough  so  that  he  can  move  it  easily.  If  the  box 
is  under  the  bananas  and  the  monkey  is  on  the  box,  he  will  be  able  to  reach  the 
bananas.  How  can  the  monkey  get  the  bananas  (if  he  wants  them)? 

3-S.  {The  Mutilated  Checkerboard.)  Show  that  it  is  impossible  to  completely 
cover  the  “mutilated-checkerboard”  with  1x2  tiles  so  that  the  tiles  neither  over¬ 
lap  nor  stick  out  over  the  edge  of  the  board. 


3—9.  {The  Tower -of -Hanoi  Problem.)  Initially  three  disks  of  different  sizes,  each 
having  a  hole  in  its  center,  are  placed  as  shown  in  the  diagram  below,  all  of 
them  about  one  of  three  pegs.  It  is  desired  to  transfer  their  initial  configuration 
to  the  third  peg,  moving  them  one  at  a  time  in  such  a  way  that  only  the  top  disk 
on  a  peg  is  ever  the  disk  being  moved  and  a  larger  disk  is  never  placed  on  top  of 
a  smaller  disk.  How  can  this  be  done? 


12  3  1  2  3 


Start  Goal 


3-10.  {The  Sliding  Block  Puzzle.)  Nine  blocks  are  placed  in  a  tray  as  shown 
below,  (a)  How  many  different  configurations  of  the  blocks  may  be  obtained  by 
sliding  them  about  in  the  tray?  (b)  How  many  different  configurations  of  the 
puzzle  are  there  if  configurations  that  may  be  obtained  from  each  other  by 
rotating  or  flipping  the  tray  are  considered  to  be  the  same?  (c)  Design  a 
computer  program  that  can  explore  the  state-space  of  the  sliding-block  puz^e. 
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3—11.  {The  San  Diego  Problem.)  You  have  a  road  map  for  the  area  surround- 
ing  your  present  location;  however,  because  the  map  was  produced  by  the  Super- 
Duper  gas-station  chain,  it  shows  only  the  roads  in  a  30-mile  circle,  the  north  and 
east  directions,  the  locations  of  two  SuperDuper  gas  stations,  and  your  present 
location.  Actually,  you  want  to  go  to  San  Diego,  which  you  know  to  be  400 
miles  to  the  south.  How  might  you  get  there,  if  you  know  how  to  drive,  have 
a  car,  and  sufficient  money  for  gas,  food,  and  lodging  along  the  way? 


3-12.  Suppose  we  are  given  that  nodes  C,  and  D  in  Fig.  3-8  represent 
trivially  solvable  problems,  (a)  What  can  be  said  about  the  solvability  of  node 
A?  (b)  What  if  B,  C,  and  D  are  unsolvable? 

3-13.  (Peg  Solitaire.)  A  board  contains  33  standard-size  holes  in  which  have 
been  placed  32  standard-size,  removable  pegs.  The  goal  is  to  remove  the  pegs 


•  •  • 
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•  • 
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in  such  a  way  that  a  board  is  obtained  which  contains  only  one  peg,  placpd  in 
the  (initially  empty)  center  hole.  Pegs  may  be  removed  only  by  “jumping” 
them,  as  in  checkers;  that  is,  a  peg  A  may  be  removed  if  and  only  if  there  is  a 
peg  B  next  (left,  up,  right,  or  down)  to  it  and  an  empty  hole  C  on  the  opposite 
side,  and  the  removal  of  peg  A  is  accompanied  by  the  placing  of  peg  B  in  hole  C. 

3-14.  Why  should  a  depth-first  search  procedure  always  expand  the  most 
recently  generated  node  first? 


Checkerboard  pattern.  (Reproduced  with  permission,  D.  K.  Robbins.) 
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GAME  PLAYING 


INTRODUCTION 

In  this  chapter  we  investigate  the  ability  of  computers  to  play 
games.  First  the  nature  of  the  games  that  computers  are  able  to  play 
will  be  reviewed.  Then  the  way  in  which  computers  may  make  use  of 
heuristic  search  techniques  in  order  to  play  these  games  will  be  de¬ 
scribed.  These  discussions  will  lead  us  to  computer  programs  that  are 
capable  of  playing  Checkers,  Chess,  go,  Ppker,  and  Bridge.  The  chapter 
concludes  with  a  brief  explanation  of  “general”  game-playing  programs. 


GAMES  AND  THEIR  STATE  SPACES 

Some  of  the  most  important  programs  produced  by  ai  research  are 
those  that  simulate  the  human  ability  to  play  games:  Games  comprise  a 
general  class  of  problem  concerned  with  reasoning  about  actions.  They 
can  be  constructed  with  or  without  an  element  of  chance  involved,  and 
they  can  be  designed  so  as  to  specify  that  different  players  will  have 
different  information  available  to  use  in  deciding  how  to  play.  Finally, 
games  offer  the  possibility  of  a  direct  comparison  between  the  abilities 
of  machines  and  humans. 

It  is  probably  wise  to  remark  that  all  games  that  computers  can 
now  play  are  of  the  type  that  is  generally  known  as  “games  of  strategy” 
because  they  possess  well-defined  rules  and  objectives  for  each  player. 
Of  course  no  claim  is  being  made  that  these  are  the  only  games  that 
exist.  The  reader  is  probably  familiar  with  many  games  that  do  not 
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have  objectives  (and  perhaps  a  few  that  do  not  have  rules),  which  are 
also  popular.  Since  the  ultimate  value  of  a  game  is  the  enjoyment  one 
gets  from  playing  it,  games  of  strategy  cannot  be  said  to  be  the  most 
valuable  ones  that  people  play.  Still,  they  are  the  ones  most  easily 
identified  with  the  use  of  intelligence,  in  its  role  as  an  ability  to  solve 
problems,  and  so  it  is  natural  that  games  of  strategy  should  be  studied 
extensively. 

strategy 

A  game  of  strategy  consists  of  a  sequence  of  moves,  each  of  which 
is  an  occasion  for  a  choice  between  certain  alternatives,  made  by  one 
of  the  players  of  the  game.^  The  rules  of  the  game  specify  for  each  move 
which  player  makes  the  move  and  what  his  alternatives  are.  These  rules 
are  finitely  describable,  and  are  to  be  known  to  each  of  the  players.  What 
the  rules  will  specify  usually  depends  on  the  previous  choices  made  in 
the  game.  For  each  move,  only  a  finite  number  of  alternatives  are  avail¬ 
able.  A  complete  sequence  of  choices  (one  that  the  rules  define  as 
terminating  the  game)  is  said  to  constitute  a  play  of  the  game.  In  some 
games  the  rules  will  sometimes  specify  that  the  choice  is  to  be  made  by 
chance,  in  which  case  the  players  are  usually  given  a  definite,  or  at  least 
computable,  probability  distribution  for  the  various  alternatives.  At 
each  move,  a  player  always  knows  completely  what  his  alternatives  are, 
but  he  may  not  know  completely  what  choices  have  been  made  previ¬ 
ously.  If  at  each  move  every  player  knows  completely  all  the  choices 
that  have  been  made  so  far  in  the  game,  it  is  said  to  be  a  game  of 
perfect  information.  Finally,  for  each  of  the  possible  plays  of  the  game, 
the  rules  specify  a  payment — which  may  be  positive,  negative,  or  zero 
— to  be  received  by  each  of  the  players.  The  objective  of  each  player  is 
to  maximize  the  payment  he  receives,  by  definition  (if  a  player^s  pay¬ 
ment  is  negative,  then  he  is  said  to  make  the  payment.) 

These  statements,  of  course,  summarize  only  the  logical,  formal 
aspects  of  games  of  strategy,  and  say  nothing  about  such  questions  as 
how  a  given  game  might  be  implemented  physically,  how  the  moves 
might  be  represented,  etc.  The  computer  programs  discussed  here  accept 
symbolic  descriptions  of  the  choices  made  during  a  game,  and  when  the 
rules  require  it,  they  produce  symbolic  descriptions  of  their  own 

^  This  paragraph  comprises  a  brief  summary  of  the  basic  definitions  for 
“games  of  strategy”  presented  by  von  Neumann  and  Morgenstern  (1944)  in 
their  “theory  of  games”  (or,  simply,  “game  theory”).  The  word  “move”  is  used 
in  their  game-theoretic  sense:  “It*s  your  move.”  In  some  games  (e.g.,  Chess)  it  is 
common  to  use  the  word  in  an  additional  sense:  “He  moved  the  king’s  pawn 
forward  two  spaces.” 
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“choices”  (note  4-1).  Other  computer  programs  might  be  written  to 
convert  the  symbolic  descriptions  produced  by  a  game-playing  program 
into  a  physical  action,  such  as  moving  a  pawn  forward  on  a  chessboard. 
Our  concern  in  this  chapter  is  only  with  computer  programs  that  handle 
the  “intellectual”  aspects  of  game  playing. 

In  general,  a  strategy  is  any  set  of  rules  that  tells  a  player  what 
choices  he  should  make  for  all  situations  that  might  arise  during  the 
course  of  a  game.  A  “good”  strategy  is  one  that  guarantees  its  user  will 
receive  a  high  payment,  or,  in  the  case  of  games  involving  chance,  it  is 
one  that  provides  for  a  high  “mathematically  expected”  (in  a  sense, 
probable)  payment.  Given  the  complete  description  of  a  game,  the 
theory  of  games  provides  a  computational  procedure  capable  of  de¬ 
termining  the  correct  strategies  for  all  players,  their  best  expectation  in 
playing  the  game,  etc. 

This  procedure  depends,  however,  upon  the  enumeration  of  all 
strategies  available  to  each  of  the  players  (including  the  strategies 
“chance”  might  use),  which  is  something  easy  to  describe  but  frequently 
difficult  to  perform.  Thus,  for  many  games,  the  number  of  strategies 
may  be  considered  “effectively  infinite,”  since  any  attempt  to  enumerate 
them  all  would  require  too  much  time.  As  we  shall  see  below,  this  is 
true  of  the  more  difficult  board  games  (Checkers,  Chess,  and  go)  played 
by  people.  Yet  people  seem  to  be  able  to  play  these  games  fairly  well 
(note  ^2).  Throughout,  this  chapter  emphasizes  primarily  the  nature  of 
the  strategies  that  computers  can  use  to  play  games  and  the  extent  to 
which  computers  can  be  enabled  to  select  their  own  strategies.  To  pave 
the  way  for  a  discussion  of  this  topic,  let  us  present  another,  very  similar 
formalization  for  “games  of  strategy.” 

state  Spaces 

The  brief  description  of  games  given  above  can  be  rephrased,  using 
the  terminology  of  the  state-space  paradigm  for  problems  presented  in 
Chapter  3.  A  game  may  be  viewed  as  a  state-space  graph,  together  with 
a  function  associating  some  of  the  paths  through  the  graph  with  pay¬ 
ments  (positive,  negative,  or  zero)  to  be  received  by  the  players  of  the 
game.  The  nodes  or  states  of  the  graph  are  descriptions  of  the  moves  or 
situations  involved  in  the  game;  the  arcs  emanating  from  a  given  node 
(the  operators  applicable  to  a  given  state)  are  the  alternatives  associated 
with  the  corresponding  move.  Thus,  a  node  in  the  state  space  of  Check¬ 
ers  is  a  description  of  a  legal  configuration  of  pieces  on  a  checkerboard, 
together  with  an  indication  of  whose  move  it  is.  A  node  from  which 
no  arcs  emanate  (i.e.,  a  terminal  node)  is  one  for  which  the  game  ends. 
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It  is  common  to  indicate  that  a  certain  player  is  to  make  a  given  move 
(choose  among  the  alternatives  associated  with  that  move)  by  drawing 
the  node  for  that  move  with  a  certain  shape  or  shading  that  is  different 
from  that  used  for  the  moves  belonging  to  the  other  players.  Terminal 
nodes  are  usually  drawn  with  the  shape  or  shading  of  the  player  whose 
move  they  would  be,  if  the  rules  make  a  specification,  even  though  they 
do  not  have  successors.  Thus,  Fig.  4~1  shows  the  state-space  graph  for 
a  simple  game. 

A  game  of  strategy  begins  at  the  node  in  the  graph  labeled  “start.” 
The  person  who  has  the  starting  move  (player  3  in  Fig.  4-1)  in  the 
game  chooses  one  of  the  available  alternatives  (one  of  the  arcs  leaving 


#  chance 
O  player  1 
A  player  2 
□  player  3 

Figure  4-1.  The  state-space  graph  for  a  simple  game. 
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the  start  node)  and  “moves”  the  game  along  the  corresponding  arc  to 
another  node  in  the  state  space;  say,  B  in  Fig.  4-1.  Node  B  represents 
the  move  by  player  2,  so  player  2  chooses  one  of  the  available  alterna¬ 
tives  and  “moves”  the  game  along  the  corresponding  arc  to  another 
node  in  the  state  space  (for  example,  node  C).  Node  C  represents  a 
move  that  is  to  be  made  by  “chance”;  the  probabilities  that  chance  will 
choose  the  various  available  alternatives  are  indicated  by  numbers  next 
to  the  corresponding  arcs  (each  number  must  be  between  zero  and  one, 
and  their  sum  must  equal  one).  The  game  continues  in  this  fashion  until 
it  is  moved  to  a  terminal  node  (e.g.,  node  D).  A  path  through  the  state 
space  that  leads  to  a  terminal  node  is  known  as  a  play  of  the  game.  If 
there  are  n  players  involved  in  a  game,  it  is  said  to  be  an  n-person  game. 
If  chance  is  involved  in  a  game,  it  is  said  to  be  a  game  of  chance;  other¬ 
wise  it  is  called  a  nonchance  game.  Thus,  Fig.  4-1  shows  the  state- 
space  graph  for  a  three-person  game  of  chance. 

The  strategic,  “problematic”  aspect  of  games  of  strategy  arises 
from  the  definition  of  a  payment  function,  which  specifies  that  certain 
paths  through  the  state  space  of  such  a  game  will  yield  payments  to  the 
players.  By  definition,  a  player  in  a  game  of  strategy  has  the  problem 
of  trying  to  insure  that  he  receives  a  high  payment  during  the  play  of 
the  game  that  actually  occurs.  Before  the  game  starts,  each  player  is 
assumed  to  have  been  given  a  complete,  finite  description  of  the  state- 
space  graph  and  of  the  payment  function  for  the  game.  A  player  “acts 
strategically”  when  he  makes  a  move  after  investigating  the  possible 
consequence  of  choosing  the  various  alternatives,  in  light  of  what  he 
knows  about  the  game  from  the  description  of  its  state  space  and  its 
payment  function,  and  in  light  of  what  he  knows  about  the  path  that 
has  so  far  been  taken  through  the  state  space  of  the  game.  The  game 
is  one  of  “perfect  information”  if  all  of  the  players  always  know  what 
path  has  been  taken;  otherwise  it  is  said  to  be  a  game  of  “imperfect 
information.”  Chess  and  Checkers  are  examples  of  perfect-information 
games.  Double-blind  Chess,  or  “ICriegspiel,”  is  an  example  of  an  im¬ 
perfect-information  game  (note  4-3).  Bridge  and  Poker  are  also  ex¬ 
amples  of  imperfect-information  games,  since  a  person  playing  them 
usually  does  not  know  what  hands  are  held  by  the  other  players.  Bridge 
and  Poker  are  also  “games  of  chance,”  in  contrast  to  Kriegspiel. 

The  payment  function  for  a  game  of  strategy  can  be  very  complex: 
However,  it  is  not  correct  to  assume  that  in  a  game  of  strategy  each 
player  is  necessarily  competing  with  the  other  players.  In  some  games 
(e.g..  Bridge)  players  must  form  teams,  and  each  player  must  cooperate 
with  those  who  are  in  his  team.  In  other  games  (e.g.,  Poker)  it  may 
sometimes  be  strategically  sound  for  two  players  to  form  a  temporary 
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alliance  against  another  player.  One  can  even  devise  games  of  strategy 
in  which  there  is  no  competition  between  players  at  all  (note  4-4). 

However,  most  popular  games  of  strategy  do  have  some  degree  of 
competition  involved  in  them.  Many  games  can  be  described  as  strictly 
competitive;  such  games  are  also  known  as  zero-sum  games,  since  their 
payment  functions  specify  that  for  any  play  of  such  a  game  the  sum  of 
the  payments  received  by  all  players  in  the  game  must  be  equal  to  zero. 
In  a  strictly  competitive  game,  no  player  receives  a  positive  payment 
unless  other  players  receive  counterbalancing  negative  payments.  Ex~ 
amples  of  zero-sum  games  are  Chess,  Checkers,  Tic-tac-toe,  Hex,  and 
GO.  In  these  games  the  payments  that  may  be  received  are  1,  0,  and 
—  1  (“win,”  “draw,”  and  “lose”).^  These  games  are  also  examples  of 
two-person,  nonchance  games  of  perfect  information. 

For  a  computer  program  to  play  a  game,  it  must  be  able  to  select 
a  legal  alternative  whenever  it  is  required  to  make  a  move.  For  it  to 
play  the  game  well,  it  must  select  alternatives  that  will  tend  to  bring 
about  plays  of  the  game  for  which  the  payment-function  awards  the 
program  a  large  payment. 

There  are  essentially  two  ways  a  computer  program  can  go  about 
selecting  desirable  alternatives.  We  shall  refer  to  them  as  the  local  and 
global  approaches.  The  local  approach  has  been  fairly  successful  with 
a  few  difficult  games  (Kalah,  Checkers,  and  Chess),  although  its  suc¬ 
cess  has  diminished  with  the  more  difficult  games.  Except  for  games  with 
very  small  state  spaces,  or  during  the  “end  plays”  of  very  large  games, 
it  is  generally  not  possible  for  the  local  approach  to  work  perfectly,  in 
the  sense  of  always  selecting  the  best  available  alternative.  On  the  other 
hand,  the  global  approach  has  had  success  wiffi  a  few  limited  classes  of 
relatively  simple  games  (e.g.,  Tic-tac-toe,  Nim,  and  Hex),  but  its  tech¬ 
niques  may  eventually  be  extended  to  more  difficult  games. 

A  program  that  uses  the  global  approach  is  designed  to  analyze  the 
game  as  a  whole.  The  computer  might,  for  example,  prove  theorems 
about  the  game,  using  its  description  of  the  game  and  its  past  experience 
at  playing  the  game.  Such  theorems  might  reduce  the  game  to  other, 
simpler  games.  This  approach  has  been  investigated  by  Banerji,  Koff- 
man,  Amarel,  Pitrat^  and  others.  Programs  that  use  it  are  discussed  in 
the  final  section  of  this  chapter. 

A  program  that  uses  the  local  approach  is  designed  to  analyze  a 
part  of  the  state  space  of  the  game.  Given  a  situation  that  is  the  pro¬ 
gram’s  move,  the  program  can  enumerate  some  of  the  paths  through  the 


^  It  is  often  more  convenient  for  a  programmer  to  effectively  give  win,  lose, 
and  draw  the  values  oo,  0,  and  ~  oo. 
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state  space  of  the  game  which  might  result  from  choosing  among  the 
available  alternatives.  The  program  could  be  designed  to  analyze  these 
paths,  using  its  description  of  the  payment  function,  and  to  select  an 
alternative  that  “leads  to  a  desirable  set  of  paths.”  In  the  case  of  zero- 
sum  games,  the  type  of  analysis  the  program  must  perform  is  known 
as  a  minimax  analysis.  The  collection  of  paths  through  the  state  space 
of  a  game,  which  emanate  from  a  given  node  in  that  state  space,  is 
known  as  the  game  tree  below  that  node;  the  game  tree  below  the  start 
node  of  a  game  is  often  referred  to  simply  as  the  game  tree  of  the 
game. 

Most  Ai  research  on  game  playing  has  been  concerned  with  de¬ 
veloping  computer  programs  that  use  local  analysis  to  play  zero-sum, 
two-person,  nonchance  ganies  of  perfect  information.  The  next  section 
describes  ways  in  which  heuristic  search  techniques  can  be  used  to 
analyze  such  games  locally.  The  remaining  sections  of  the  chapter  dis¬ 
cuss  programs  that  have  been  written  to  play  games.  Some  programs 
discussed  play  imperfect-information  games  of  chance.  For  a  mote 
extensive  yet  simple  treatment  of  classical  (enumerative)  g£^me  theory, 
see  Williams  (1954).  The  original  book  on  the  subject  (von  Neumann 
and  Morgenstern,  1944)  is  highly  recommended.  Discussions  of  the 
state-space  approach  to  the  description  of  games  and  some  examples  of 
global  analysis  of  games  are  given  in  Banerji  (1969,  1970).  Nilsson 
(1971)  and  Slagle  (1971)  present  detailed  formalizations  of  current 
applications  of  heuristic  search  techniques  to  game  playing  (i.e.,  local 
analysis). 


GAME  TREES  AND  HEURISTIC  SEARCH 
Game  T  rees  and  M  inimax  Analysis 

In  general,  the  techniques  presented  in  Chapter  3  for  searching 
graphs  and  trees  in  order  to  solve  problems  are  applicable  to  the  design 
of  game-playing  programs.  In  particular,  the  terminologies  and  concepts 
associated  with  game  trees  are  similar  to  those  for  problem  trees,  as  de¬ 
fined  in  Chapter  3.  The  basic  difference  between  games  and  the  puzzle¬ 
like  problems  already  discussed  is  that,  with  a  game,  different  nodes 
belong  to  different  players  and  no  player  can  completely  control  the 
path  that  is  actually  taken  through  the  state  space  (note  4-5).  Through¬ 
out  this  section  we  shall  be  concerned  only  with  zero-sum,  two-person, 
nonchance  games  of  perfect  information. 

The  game  tree  below  a  given  node  in  the  state  space  of  a  game  is 
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usually  drawn  with  the  given  node  at  the  top,  as  the  root  of  the  tree. 
The  successors  to  the  root  node  are  placed  immediately  below  it,  and 
arcs  are  drawn  from  the  root  to  each  of  its  successors.  The  root  node 
and  its  successors  are  known  as  the  ^‘top  nodes”  of  the  tree.  The  process 
is  then  repeated  for  each  of  the  successors  to  the  root  node.  Figure  4-2 


□ 


□  5 

[H 

® 

^^0  0 

©® 

[i]\B 

©© 

1 

/I  i\ 

© 

011]  [till] 
©  © 

© 

□ 

player  1 

player  2 


Figure  4-2:  A  state-space  graph  for  a  simple  game  and  the  correspond¬ 
ing  game-tree. 


shows  a  state-space  graph  for  a  simple  two-person  game,  and  the  corre¬ 
sponding  game  tree  of  the  game  (i.e.,  the  game  tree  below  the  start, 
or  root,  node).  As  may  be  seen  from  Fig.  4-2,  it  is  possible  for  the 
“same”  state-space  node  to  occur  in  many  places  throughout  a  game 
tree.  This  is  just  another  way  of  saying  that  there  may  be  many  paths 
connecting  two  nodes  in  a  state  space.  Thus,  “3, 2,6,7”  and  “3, 4, 5, 7” 
are  two  different  paths  connecting  state-space  nodes  3  and  7  in  Fig. 
4-2.  The  purpose  of  a  game  tree  is  to  represent  separately  each  of  the 
possible  paths  through  the  state  space  of  the  game.  Thus,  each  game- 
tree  node  represents  a  path  through  the  state  space  of  the  game;  that  is, 
a  sequence  of  state-space  nodes.  TTie  expansion  or  generation  of  a  game 
tree  terminates  with  those  nodes  that  do  not  have  successors;  a  terminal 
node  in  a  game  tree  is  often  referred  to  as  a  tip  node.  In  Fig.  4-2  there 
are  ten  tip  nodes  in  the  game  tree  and  only  two  terminal  nodes  in  the 
corresponding  state-space  graph.  The  number  of  plays  of  a  game  is 
equal  to  the  number  of  tip  nodes  of  its  game  tree.  Thus,  there  are  ten 
plays  for  the  game  whose  state-space  graph  is  shown  in  Fig.  4-2. 

It  is  evident  from  Fig  ^-2  that  even  a  game  with  a  small  state  space 
may  have  a  large  game  tree  and  a  large  number  of  possible  plays.  Games 


Game  playing 


125 


like  Checkers,  Chess,  and  go  have  game  trees  so  large  that  they  cannot 
be  physically  generated  completely.  (They  also  have  large  state  spaces.) 
When  it  is  not  possible  to  actually  count  the  number  of  plays  of  a  game, 
the  number  may  be  approximated  by  using  estimates  of  the  average 
branching-factor  B  and  the  average  depth  D  of  the  game  tree.  Thus, 
we  estimate^  that  the  game  tree  of  Checkers  has  an  average  depth  of 
100  and  an  average  branching  factor  of  6  (i.e.,  the  average  possible 
play  of  the  game  might  run  100  moves  and  each  move  might  have  an 
average  of  6  available  alternatives) :  The  total  number  of  possible  plays 
for  Checkers  is  then 

^  ^100  ^  JQ78 

Similarly,  it  has  been  estimated  that  there  are  10^^°  possible  plays  for 
Chess  (Shannon,  1950a, b)  and  possible  plays  for  go  (Zobrist, 
1969). 

Let  us  suppose  for  a  moment  that  a  player,  whom  we  shall  call 
player  1,  actually  could  generate  the  entire  game  tree  for  any  finite 
game,  no  matter  how  large,  and  discuss  how  he  might  select  a  strategy 
for  a  (zero-sum,  two-person,  nonchance,  perfect-information)  game 
such  as  Checkers,  Chess,  or  go.  We  shall  give  this  person  ‘‘infinite  time 
and  resources”  and  see  what  happens. 

As  stated  before,  each  player’s  objective  is  to  maximize  the  pay¬ 
ment  he  receives  during  the  play  of  the  game  that  actually  occurs.  We 
are  concerned  only  with  perfect-information  games.  Thus,  the  player 
whose  move  it  is  knows  exactly  what  path  has  been  taken  from:  the 
start  node  to  arrive  at  the  current  situation  in  the  game,  and  he  knows 
exactly  what  the  current  situation  is  (e.g.,  what  pieces  are  where  on 
the  checkerboard).  Because  he  is  given  a  description  of  the  rules  of  the 
game,  he  knows  exactly  what  alternatives  he  can  choose  to  apply  to  the 
current  situation  and  what  situations  will  result.  Because  our  player  has 
infinite  time  and  resources,  he  can  generate  the  complete  game  tree 
below  the  given  node,  and  can  determine  the  payments  associated  with 
each  of  the  plays  emanating  from  that  node.  He  will  tlien  haveii  infor- 
mation  something  like  that  indicated  in  Fig.  4-3. 

Nodes  B,C,E,Fy  and  G  in  the  figure  are  tip  nodes  of  the  tree.  Each 
of  the  tip  nodes  of  the  game  tree  identifies  a  different  play  of  the  game. 
Using  the  payment  function,  player  1  can  calculate  the  payment  speci¬ 
fied  for  each  play  of  the  game,  and  he  can  consider  this  payment  to  be 

3  This  is  based  on  a  conversation  with  Arthur  Samuel  and  is  Only  a  very 
rough  estimate.  Another  figure  often  given  is  10*®  plays  (sometimes  10*®  nodes 
in  the  game  tree),  based  on  an  estimate  in  Samuel  (1959).  The  higher  estimate 
is  used  here. 
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Figure  4-3.  Player  1’s  maximum  necessary  game  tree. 

associated  with  the  corresponding  tip  node  of  the  tree.  Thus,  he  might 
determine  that  the  play  represented  by  node  B  would  yield  him  a  pay¬ 
ment  of  +10,  whereas  the  play  represented  by  node  C  would  yield  him 
a  payment  of  —2.  Consequently,  he  would  know  that  if  he  and  player 
2  should  take  the  path  through  the  state  space  represented  by  node  A 
in  the  game  tree,  they  would  arrive  at  a  situation  in  which  it  would  be 
player  Ts  turn  to  move,  and  he  would  be  able  to  select  from  two 
alternatives:  one  yielding  him  a  payment  of  +10,  the  other  yielding  a 
payment  of  —2.  Because  each  player’s  objective  is  to  maximize  the 
payment  he  receives,  we  say  that  the  value  of  the  path  represented  by 
node  A  is  +10  to  player  1  and  —lO  to  player  2. 

Similarly,  player  1  could  determine  the  payments  associated  with 
the  plays  represented  by  tip  nodes  E,F,G/ respectively;  he  might  calcu¬ 
late  that  play  E  would  yield  him  a  payment  of  +1,  play  F  a  payment 
of  —6,  and  play  G  a  payment  of  +9.  Because  the  game  is  a  zero-sum 
game,  he  knows  that  F,F,  and  G  will  yield  player  2  payments  of  —  1, 
+  6,  and  —9,  respectively.  Consequently,  he  would  know  that  if  he  and 
player  2  should  take  the  path  through  the  state  space  represented  by 
node  D  in  the  game  tree,  they  would  arrive  at  a  situation  in  which  it 
would  be  player  2’s  turn  to  move,  and  player  2  would  choose  the  alter¬ 
native  leading  to  play  F.  Thus,  the  value  of  node  Z)  is  —6  to  player  1 
and +6  to  player  2. 

Because  he  has  “infinite  time  and  resources,”  player  1  can  con- 
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tinue  to  find  the  value  of  each  node  in  the  game  tree  below  his  current 
situation  by  “backing  up”  his  evaluation  of  nodes  from  the  tip  nodes  of 
the  game  tree  according  to  the  following  rule:  The  value  of  a  given 
node  to  player  1  is  the  maximum  of  the  values  of  its  successor  nodes  to 
player  L  Similarly,  the  value  of  a  given  node  to  player  2  is  the  maximum 
of  the  values  of  its  successor  nodes  to  player  2.  Moreover,  because  of 
the  zero-sum  nature  of  the  game,  the  value  of  a  given  node  to  player  I 
is  the  minimum  of  the  values  of  its  successor  nodes  to  player  2,  and  the 
value  of  a  given  node  to  player  2  is  the  minimum  of  the  values  of  its 
successor  nodes  to  player  1.  If  player  1  determines  the  values  of  all 
nodes  in  the  game  tree  below  his  current  situation  according  to  these 
rules,  he  is  said  to  have  done  a  complete  minimax  analysis,  or  evalu¬ 
ation,  of  the  game  tree  below  his  current  situation.  The  value  for  a 
node  that  one  obtains  by  performing  a  complete  minimax  evaluation  is 
referred  to  as  the  theoretical  value  of  the  node. 

All  the  game  trees  considered  are  finite,  so  player  1  will  be  able 
to  generate  the  complete  game  tree  and  do  a  minimax  analysis  of  its 
nodes  in  a  finite  time  (he  is  given  infinite  time  and  resources  simply 
because  there  is  no  a  priori  limit  to  how  big  the  tree  might  be  and  how 
much  time  he  might  require — ^whatever  time  he  does  require,  though, 
will  be  finite).  He  will  then  know  the  theoretical  values  of  the  successor 
nodes  to  his  current  situation.  If  player  1  chooses  an  alternative  leading 
to  a  successor  node  that  has  the  maximum  theoretical  value  (to  him) 
of  all  the  successor  nodes  to  his  current  situation,  and  if  he  continues  to 
choose  in  this  way  whenever  it  becomes  his  turn  to  move,  we  say  that 
he  is  playing  “perfectly.”  If  player  1  plays  perfectly,  then  the  best  pay¬ 
ment  he  can  expect  from  the  game,  if  player  2  plays  perfectly,  is  guaran¬ 
teed.  If  player  2  does  not  play  perfectly,  then  player  1  will  receive  an 
even  higher  payment  (note  4-6). 

EXAMPLE  4—1.  How  much  time  would  it  take  an  '^attainable” 
machine  to  generate  and  minimax-evaluate  the  complete  game 
tree  for  Checkers?  We  have  assumed  B  —  6  and  D  =  100.  By 
the  rule  for  trees,  developed  in  Chapter  3,  there  are  approxi¬ 
mately  (B^"^)/(B  -  1)  nodes  in  the  complete  game  tree;  thus, 
there  are  (6^°^)/5  =  approximately  2  X  10'^®  nodes  in  the  game 
tree  of  Checkers.  The  “attainable”  machine  might  generate  the 
game  tree  at  a  rate  of  I  node  per  nanosecond  and  then  minimax- 
evaluate  it  at  a  rate  of  1  node  per  nanosecond.  There  are  3.15  X 
10^®  nanoseconds  in  a  century,  so  the  machine  would  require 
(4  X  10^®) /( 3. 15  X  W®)  =  approximately  10®*^  centuries  to 
complete  this  procedure. 
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Example  4-1  illustrates  that  it  really  would  be  necessary  to  give 
player  1  “infinite”  time  and  resources,  at  least  by  comparison  with  cur¬ 
rent  scientific  estimates  for  the  lifetime  of  the  universe  (<10^^^  years), 
if  we  expect  him  to  do  a  complete  minimax  analysis  of  Checkers  at  an 
“attainable”  rate.  In  general,  it  is  not  possible  for  a  computer  program 
to  minimax-evaluate  the  complete  game  tree  below  a  given  node  of 
Checkers  unless  that  node  happens  to  be  very  close  to  the  tips  of  the 
tree.  The  Exercises  at  the  end  of  this  chapter  show  that  the  same  re¬ 
sults  obtain  for  Chess  and  go. 

Even  though  a  computer  program  cannot  usually  generate  the 
entire  game  tree  below  a  given  node,  it  can  still  generate  a  portion  of 
that  game  tree.  In  most  of  the  possible  situations  (nodes  in  the  state 
space)  that  might  occur  in  games  like  Checkers,  for  example,  the  aver¬ 
age  node  may  have  six  successors,  but  of  these  six  perhaps  only  three 
would  be  considered  “plausible”  or  “reasonable”  by  a  human  Checkers 
player.  If  a  program  could  be  designed  to  generate  only  those  successors 
that  were  “reasonable,”  that  could  do  a  minimax  analysis  on  the  re¬ 
sulting  reasonable  game^tree,  and  that  could  select  the  alternative  below 
its  current  situation  with  the  highest  reasonable  evaluation,  it  would 
still  be  able  to  play  a  very  good  game.  We  can  estimate  that  Checkers 
has,  on  the  average,  three  reasonable  successors  to  each  node  and  that 
the  average  reasonable  play  has  a  length  of  40  moves.  There  are  thus 
=  10^®  reasonable  plays  of  Checkers.  Similarly,  there  are  about 
550  ^  10®^  reasonable  plays  of  Chess  and  10^^  reasonable  plays  of  go. 
Of  course  playing  “reasonably”  is  not  the  same  thing  as  playing  “per¬ 
fectly.”  If  we  had  the  complete  evaluation  of  the  Checkers  (or  Chess 
or  go)  game  tree,  we  might  find  that  some  nodes  people  currently 
consider  “reasonable”  have  in  fact  very  low  theoretical  values;  con¬ 
versely,  we  might  find  that  some  nodes  people  currently  consider  “un¬ 
reasonable”  have  very  high  theoretical  values. 

EXAMPLE  4—2.  How  much  time  would  it  take  an  ''attainable'* 
machine  to  generate  and  minimax-evaluate  the  complete  reason¬ 
able  game  tree  for  Checkers?  On  the  basis  of  5  =  3  and  D  =  40, 
there  are  approximately  (3^^) /2  nodes  in  the  complete,  reasona¬ 
ble  game  tree.  The  “attainable”  machine  might  generate  the 
game  tree  at  a  rate  of  1  node  per  nanosecond  and  minimax- 
evaluate  it  at  the  same  rate.  Thus,  the  machine  would  require 
about  ( 3 .5  X  1 0^® )  /  (  3 . 1 5  X  10^)  =:  approximately  1 0  centu¬ 
ries,  or  a  thousand  years,  to  complete  this  procedure. 

Again  it  is  not  practically  possible  for  a  computer  to  evaluate  the 
complete,  reasonable  game  trees  of  games  like  Chess,  Checkers,  and  go. 
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Instead,  when  a  computer  program  attempts  to  select  the  best  alterna¬ 
tive,  or  successor,  available  for  a  given  node  in  the  game  tree,  it  will 
(if  it  uses  local  analysis)  generate  only  a  portion  of  the  reasonable 
game  tree  below  that  node,  and  will  minimax-evaluate  that  portion  to 
estimate  the  best  immediate  alternative.  It  will  then  output  a  description 
of  that  alternative  (as  being  its  “choice”  for  the  move)  and  wait  until 
it  is  required  to  make  another  move  (estimate  another  alternative). 
The  rest  of  this  section  considers  how  a  computer  program  can  generate 
a  reasonable  portion  of  a  game  tree  and  how  it  may  analyze  the  portion 
that  it  generates. 

static  Evaluations  and  Backed-up  Evaluations 

In  order  to  generate  and  analyze  a  portion  of  the  reasonable  game 
tree  below  a  given  node,  it  is  necessary  to  judge  the  “reasonableness” 
of  nodes  in  some  way  that  is  not  dependent  upon  having  judged  many 
of  their  successor  nodes.  A  static  evaluation  function  is  a  method  for 
estimating  the  value  of  a  node  which  is  not  dependent  on  the  values  of 
the  successors  to  that  node.  A  good  static  evaluation  function  is  one 
that  tends  to  give  estimates  that  agree  with  the  true,  theoretical  values 
of  the  nodes  in  a  game  tree.  Different  games  require  different  static- 
evaluation  functions.  In  general,  it  is  not  possible  to  design  a  static- 
evaluation  function  that  is  perfect  for  a  given  game;  that  is,  one  that  will 
estimate  for  each  node  a  value  that  is  equal  to  the  theoretical  value  of 
that  node.^ 

For  our  purposes,  a  static  evaluation  function  is  necessarily  a  com¬ 
putational  procedure  that  can  be  applied  by  a  computer  to  its  descrip¬ 
tion  for  any  given  situation  that  might  occur  during  a  play  of  the  game. 
The  function  should  yield  for  the  situation  a  numerical  value  approxi¬ 
mating  that  which  would  be  obtained  by  analyzing  the  game  com¬ 
pletely.  When  applied  to  a  given  node  (situation),  the  static  evaluation 
function  may  take  into  account  such  things  as  the  number  of  pieces  one 
has,  the  positions  of  a  game  board  one  occupies,  the  number  of  suc¬ 
cessor  nodes  to  the  given  node,  or  whether  any  successor  nodes  repre¬ 
sent  “captures.” 

The  next  few  pages  discuss  how  a  game  tree  may  be  analyzed,  or 
evaluated,  given  a  static-evaluation  function.  As  stated  before,  static- 

4  If  one  did  have  a  perfect  static-evaluation  function,  there  would  be  no 
need  to  generate  a  game  tree  at  all;  instead,  to  determine  the  best  arc  from  a 
given  node,  one  would  merely  have  to  3^pply  the  static- evaluation  function  to 
each  of  the  successors  to  that  node,  and  then  select  an  arc  leading  to  a  node  with 
the  maximum  static  evaluation.  Thus,  one  would  be  playing  perfectly,  in  the 
sense  defined  in  the  text. 
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evaluation  functions  may  also  be  used  to  generate  game  trees  that  are 
reasonable  portions  of  complete  game  trees.  However,  for  purposes  of 
explication,  the  discussion  of  techniques  for  generating  game  trees  will 
be  deferred  to  the  end  of  this  section. 

Suppose,  therefore,  that  a  portion  of  the  reasonable  game  tree 
below  a  given  node  has  been  generated;  such  a  game  tree  can  be  evalu¬ 
ated  by  a  computer  program  that  makes  use  of  minimax  analysis  and  a 
static-evaluation  function.  The  rules  for  the  minimax  analysis  are  as 
follows:  If  a  given  node  is  one  for  which  it  is  the  program’s  turn  to  move, 
its  value  is  the  maximum  of  its  successors.  If  a  node  is  one  for  which 
it  is  the  opponent’s  turn  to  move,  its  value  (to  the  computer)  is  the 
minimum  of  the  values  of  its  successors.  The  value  of  a  tip  node  is  its 
static  evaluation;  that  is,  the  result  of  applying  the  static-evaluation  func¬ 
tion  to  it.  Figure  4-4  shows  a  portion  of  a  game  tree  for  which  the  tip 
nodes  have  been  assigned  values  according  to  some  hypothetical  static- 
evaluation  function  and  the  remaining  nodes  have  been  given  values 
according  to  the  rules  of  the  minimax  procedure.  A  value  given  by  the 
minimax  procedure  to  a  node  that  is  not  a  tip  node  is  known  as  backed- 
up  value  for  the  node.  To  determine  the  backed-up  value  for  a  given 
node,  one  must  first  find  the  static  evaluations  of  the  tip  nodes  that  are 
below  it  in  the  game  tree,  and  then,  using  the  rules  of  the  minimax, 
“back  up”  evaluations  until  a  value  reaches  the  given  node. 

The  accuracy  of  the  backed-up  evaluation  of  a  given  node  (how 
close  it  is  to  the  theoretical  value  for  that  node)  is  greatly  dependent 
on  the  amount  of  the  game  tree  below  that  node  to  which  one  applies 
the  minimax  procedure.  Again,  in  general,  neither  the  static  evaluation 
nor  the  backed-up  evaluation  will  be  infallible  indicators  of  the  theoreti¬ 
cal  evaluation  for  a  given  node.  However,  the  accuracy  of  the  static 
evaluation  will  often  be  better  for  nodes  near  the  tips  of  the  complete 
game  tree  than  it  is  for  nodes  near  the  root.  Thus,  the  backed-up  evalu¬ 
ation  of  a  node  near  the  root  of  the  tree  will  tend  to  be  more  accurate 
than  the  static  evaluation  of  that  node,  because  the  minimax  procedure 
makes  use  of  the  static-evaluation  function  where  it  is  most  accurate 
(on  nodes  nearer  the  tips  of  the  tree). 

The  Alpha-Beta  Technique 

In  practice,  it  is  important  to  recognize  that  the  nature  of  the 
minimax  procedure  makes  it  unnecessary  to  obtain  evaluations  for  all 
nodes  in  the  game  tree  when  evaluating  the  top  nodes.  A  method  exists 
for  determining  whether  the  evaluation  of  a  given  node  can  affect  the 
evaluation  of  nodes  that  are  above  it  in  the  game  tree.  This  method  is 
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known  as  the  alpha-beta  technique,^  To  see  how  it  works,  let  us  suppose 
that  a  game-playing  program  is  given  the  task  of  evaluating  the  (portion 
of  a)  game  tree  in  Fig.  4-4,  and  is  proceeding  to  minimax  from  left 
to  right.  The  tree  is  reproduced  in  Fig.  4-5,  with  the  addition  that 
certain  significant  nodes  have  been  lettered. 

The  first  step  of  the  program  is  to  obtain  the  static  evaluations  of 
nodes  A,  B,  and  C.  These  are  found  to  be  0.2,  0.9,  and  0.3,  respectively, 
so  the  backed-up  value  of  node  D  above  them  is  determined  by  the 
minimax  procedure  to  be  0.2.  The  next  step  of  the  program  is  to  obtain 
the  static  evaluation  of  node  E,  which  is  found  to  be  0.1.  Consequently, 
we  know  that  the  backed-up  value  of  node  F  must  be  less  than  or  equal 
to  0.1  (since  the  value  of  F  is  the  minimum  of  the  values  of  its  suc¬ 
cessors).  Now,  the  value  of  node  G  is  the  maximum  of  all  values  im¬ 
mediately  below  it  because  G  represents  a  situation  in  which  it  is  the 
program’s  turn  to  move,  and  the  program  should  take  the  choice  that 
has  the  greatest  evaluation.  This  means  that  node  F  and  all  the  nodes 
below  it  need  not  be  considered  further.  The  reason  is  that  the  value  of 
node  D  has  already  been  determined,  and  whatever  the  value  of  F  it 
is  less  than  that  of  D,  Similarly,  when  the  program  evaluates  node  H 
as  being  -0.1,  it  knows  that  neither  1  nor  any  other  nodes  below  it 
need  be  evaluated.  Thus,  the  value  of  node  G  is  set  at  0.2. 

Next,  the  program  evaluates  situations  J  and  K  and  sets  the  value 
of  L  at  their  minimum,  which  is  0.6.  Since  the  evaluation  of  M  is  the 
maximum  of  the  values  of  the  nodes  immediately  below  it,  the  value  of 
M  is  greater  than  or  equal  to  that  of  1;  so,  0.6. 

The  value  of  N,  however,  is  the  minimum  of  those  of  G  and  M, 
But  since  G  has  a  value  of  0.2,  it  is  not  necessary  to  evaluate  any  more 
of  the  nodes  below  M,  and  thus  the  value  of  N  can  be  set  equal  to  0.2. 

The  program  continues  in  this  manner  to  evaluate  only  those  nodes 
in  the  tree  that  could  change  the  values  of  the  nodes  above  them.  As  an 
exercise,  the  reader  may  verify  that,  in  order  to  determine  the  most 
desirable  alternative,  the  program  need  continue  developing  evaluations 
only  for  those  nodes  labeled  P  through  U  in  Fig.  4-5.  The  other  nodes 
of  the  tree  need  not  be  considered  at  all. 

The  alpha-beta  technique  is  essentially  a  process  of  “using  common 
sense”  to  carry  evaluations  up  the  tree  with  a  minimum  amount  of  work. 
It  can  be  proved  that,  with  respect  to  a  given  static-evaluation  function, 
the  alpha-beta  technique  will  always  assign  the  same  values  to  the  top 
nodes  of  a  given  tree  as  would  the  minimax  procedure.  (A  detailed 


®  This  technique  received  its  name  from  McCarthy,  who,  with  his  students, 
did  research  on  it  at  mix. 
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formalization  of  the  alpha-beta  technique  is  given  in  Nilsson,  1971.) 
The  savings  that  can  result  from  using  the  alpha-beta  technique  are 
enormous.  With  optimum  ordering  of  the  nodes,  the  number  of  suc¬ 
cessors  to  a  given  node  which  need  to  be  evaluated  is  lowered  almost 
to  the  square  root  of  the  total  number  of  successors  to  that  node.  If  the 
branching  factor  of  the  tree  is  B  then,  with  the  alpha-beta  technique, 
one  in  effect  evaluates  a  tree  with  branching  factor  s/W,  Thus,  the 
depth  to  which  the  game  tree  below  a  given  node  can  be  analyzed, 
using  the  same  total  number  of  evaluations,  is  nearly  doubled. 

However,  the  worth  of  the  technique  is  greatly  dependent  upon 
the  order  in  which  the  nodes  of  the  tree  are  taken  for  examination.  The 
reader  may  verify  this  for  himself  by  working  the  alpha-beta  evaluation 
of  Fig.  4-^5  from  right  to  left  instead  of  left  to  right;  the  right-to-left 
ordering  makes  it  necessary  to  evaluate  almost  all  nodes  of  the  tree.  In 
using  the  alpha-beta  technique,  it  is  desirable  to  have  some  method  that 
will  make  it  likely  the  best  nodes  are  evaluated  first. 

Generating  (Searching)  Game  Trees 

This  section  concludes  with  a  description  of  some  important 
techniques  for  generating  game  trees.  The  techniques  discussed  include 
plausibility  ordering,  shallow  searching,  forward  pruning,  the  use  of 
termination  criteria,  and  dynamic  generation  and  evaluation.  As  ex¬ 
plained  at  the  end  of  the  preceding  section,  the  description  to  this  point 
relates  to  how  a  game  tree  may  be  evaluated,  using  a  static-evaluation 
function,  once  the  tree  has  been  generated.  However,  static-evaluation 
functions  may  also  be  used  by  procedures  that  generate  reasonable 
portions  of  game  trees. 

Part  of  the  motivation  for  discussing  game-tree  generating  tech¬ 
niques  may  be  evident  from  the  previous  description  of  the  alpha-beta 
technique.  Suppose  we  have  a  computer  program  that  uses  the  alpha- 
beta  technique  to  evaluate  game  trees  that  are  presented  to  it,  and 
suppose  this  program  always  applies  the  technique  by  working,  say, 
from  left  to  right  across  the  tree.  This  program  will  work  most  effi¬ 
ciently  if  the  trees  that  are  presented  to  it  are  “correctly  ordered,”  that 
is,  if  the  successors  to  each  node  in  a  given  tree  are  arranged  below 
that  node  from  left  to  right  in  the  descending  order  of  their  eventual, 
backed-up  evaluations.  One  of  the  purposes  of  game-tree  generating 
techniques  is  to  develop  game  trees  that  will  tend  to  be  correctly 
ordered  so  that  the  alpha-beta  technique  can  be  profitably  applied  to 
them. 
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Of  course  there  is  no  way  to  insure  that  a  game  tree  is  correctly 
ordered  without  having  already  performed  the  back-up,  or  minimax, 
evaluation  that  we  wish  the  alpha-beta  technique  to  replace.  However, 
we  can  increase  the  likelihood  that  the  tree  will  be  correctly  ordered  if 
we  make  use  of  some  less  extensive  technique  that  will  give  the  nodes 
in  the  tree  a  “plausible  ordering.”  Three  types  of  plausibility-ordering 
techniques  are  generators,  shallow  search,  and  dynamic  ordering. 
Dynamic  ordering  will  be  described  at  the  end  of  this  section. 

A  generator  is  a  procedure  that  automatically  produces  first  the 
most  desirable  alternatives  (and  the  situations  to  which  they  lead) 
below  a  given  situation  and  then  produces  less  desirable  alternatives, 
etc.  Thus,  a  generator  in  Chess  might  be  designed  to  first  produce  those 
alternatives  that  create  situations  in  which  the  opponent  will  be  in 
check  or  a  piece  will  be  captured.  The  nodes  in  a  game  tree  can  be  given 
a  plausibility  ordering  corresponding  to  the  sequence  in  which  they  are 
produced  by  a  generator. 

A  shallow  search  technique  is  a  procedure  that  makes  use  of  a 
static-evaluation  function  (not  necessarily  the  same  one  used  by  the 
alpha-beta  technique)  to  conduct  a  limited  tree-generation  and  evalua¬ 
tion  process  below  each  of  the  nodes  that  are  to  be  ordered.  Thus, 
suppose  a  plausibly  ordered  game  tree  is  being  generated  below  node  A 
in  Fig.  4-6;  a  shallow  search  technique  might  first  generate  the  small 
portion  of  the  tree  below  A,  shown  in  Fig.  It  would  then  apply  its 
static-evaluation  function  to  the  nodes  at  the  bottom  of  this  tree  and 
back  up  evaluations  (probably  using  its  own  alpha-beta  technique)  to 
nodes  5,  C,  and  D.  The  “shallow  evaluations”  it  obtained  for  B,  C,  and 
D  might  indicate  that  they  should  be  plausibly  ordered  C,  B,  D;  a 
shallow  search  might  then  be  done  below  C  to  determine  a  plausible 
ordering  for  nodes  Ey  F,  and  G.  In  general,  when  the  game  tree  below 
node  A  is  generated,  it  is  most  profitable  (for  the  overall  application  of 
the  alpha-beta  technique)  that  shallow  search  be  used  to  plausibly 
order  nodes  near  the  top  of  the  tree;  it  makes  relatively  little  difference 
\\^hether  shallow  search  is  used  to  order  nodes  near  the  bottom  (le., 
near  the  tip  nodes ) . 

The  other  major  purpose  of  game-tree  generating  techniques  is 
simply  to  generate  a  reasonable  portion  of  the  complete  gam^  tree 
below  a  given  node;  this  is  in  contrast  to  its  purpose  in  making  sure  that 
the  generated  portion  is  plausibly  ordered.  The  relevant  techniques 
are  forward-pruning  and  the  use  of  termination  criteria;  each  may  be 
considered  a  special  case  of  the  other.  A  game-tree  generating  procedure 
employs  termination  criteria  when  it  decides  not  to  continue  extension  of 
the  game  tree  it  is  generating,  thus  creating  tip  nodes  in  the  game  tree 
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Figure  4-6.  Using  a  shallow  search  technique  to  plausible  order  nodes 

B,  C,  D. 

it  produces.  Some  useful  termination  criteria  are  “game  over”  (the 
tip  node  produced  is  actually  a  tip  node  of  the  complete  game  tree), 
“maximum  depth,”  and  “minimum  depth.”  The  maximum-depth  termi¬ 
nation  criterion  is  employed  by  procedures  that  do  not  produce  game 
trees  having  a  depth  greater  than  some  preassigned  value.  Nodes  at  that 
depth  below  the  root  node  automatically  become  tip  nodes  of  the  game 
tree  that  is  produced.  The  minimum-depth  criterion  is  used  by  pro¬ 
cedures  that  do  not  produce  game  trees  having  a  depth  less  than  some 
preassigned  value. 

Game-tree  generating  procedures  that  employ  the  minimum-  and 
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maximum-depth  criteria  generally  use  other  criteria  that  may  override 
them.  Thus,  the  minimum-depth  criterion  may  be  overridden  by  the 
“game  over”  criterion.  Similarly,  many  procedures  make  use  of  a  “dead 
position”  criterion,  specifying  that  a  node  will  not  be  considered  a  tip 
node  unless  it  is  “dead” — what  deadness  means  depends  upon  the 
game  being  played.  Thus,  a  situation  in  Checkers  in  which  there  are 
jumps  available  is  “live”  (i.e.,  not  “dead”);  a  situation  in  Chess  in 
which  someone  is  in  check  or  there  are  captures  available  is  live;  a 
situation  in  go  in  which  there  is  a  possible  “ladder  attack”  is  live.  If  the 
dead-position  criterion  is  not  satisfied,  then  the  maximum-depth 
criterion  will  generally  be  overridden.  Live  nodes  will  always  have  their 
successors  generated  and  evaluated  (unless  the  program  runs  completely 
out  of  time  or  memory  space).  This  is  gc^d  because  it  is  difficult  to 
find  static-evaluation  functions  that  give  accurate  evaluations  for  live 
nodes.® 

A  game-tree  generating  procedure  uses  forward  pruning  when 
it  decides  not  to  continue  generating  successors  of  a  node  that  might 
otherwise  be  considered.  Thus,  returning  to  Fig.  4-6,  after  having 
plausibly  ordered  nodes  B,  C,  and  D,  a  game-tree  generating  procedure 
might  decide  that  node  D  is  too  implausible  to  merit  further  investiga¬ 
tion;  the  portion  of  the  complete  game  tree  below  D  would  therefore  be 
“pruned”  from  the  game  tree  produced  by  the  generating  procedure. 
The  time  saved  by  not  generating  or  evaluating  nodes  below  D  can  be 
used  to  search  more  deeply  elsewhere.  In  n-best  forward  pruning,  only 
the  nodes  below  the  n  most  plausible  successors  to  a  given  node  are 
searched  (generated  and  evaluated);  other  successors  are  pruned. 
(Thus,  the  discussion  of  Fig.  4-6  might  illustrate  2-best  forward- 
pruning).  In  tapered  n-best  forward  pruning,  the  parameter  n  is  de¬ 
creased  as  the  depth  of  the  given  node  increases.  Again,  the  most 
plausible  successors  below  the  given  node  may  be  determined  either 
by  use  of  a  generator  or  by  conducting,  a  shallow  search.  The  time  saved 
by  forward  pruning  must  be  weighed  against  the  chance  that  relevant 
portions  of  the  game  tree,  which  might  otherwise  be  considered,  will 
be  pruned  out.  Thus,  the  shallow  search  below  node  £>  might  have  been 
misleading;  there  might  have  been  very  valuable  nodes  farther  below. 

We  have  now  examined  the  basic  techniques  used  by  game-playing 
programs  that  use  local  analysis  to  determine  how  they  should  play 
zero-sum,  two-person,  nonchance  games  of  perfect  information.  In 
this  exposition  the  process  by  which  such  a  program  searches  a  game 
tree  to  determine  its  most  desirable  alternative  has  been  separated  into 


®The  dead-position  criterion  was  first  suggested  for  Chess  by  Turing  (1953). 
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two  parts:  a  generation  procedure  that  produces  plausibly  ordered, 
reasonable  portions  of  the  game  tree,  and  an  alpha-beta  technique  that 
evaluates  game  trees  that  are  supplied  to  it.  In  fact,  the  distinction  made 
is  an  artificial  one:  A  truly  efficient  game-playing  program  will  conduct 
both  procedures  in  a  simultaneous,  or  dynam/c,  fashion.  The  alert 
reader  will  probably  have  already  suspected  that  something  like  this 
should  be  done.  After  all,  why  generate  nodes  if  the  alpha-beta  tech¬ 
nique  is  later  going  to  decide  not  to  evaluate  them? 

A  game-playing  program  is  said  to  use  dynamic  generation  and 
evaluation  if  it  applies  the  alpha-beta  technique  as  another  part  of  its 
forward-pruning  and  plausibility-ordering  techniques.  Essentially,  such 
a  program  will  generate  “plausible  branches”  of  the  game  tree,  using  the 
results  of  the  alpha-beta  technique  to  guide  their  generation.  The 
generation  of  a  plausible  branch  will  terminate  when  it  reaches  maxi¬ 
mum  depth  and  its  tip  node  is  dead.  Evaluation  is  made  of  each  node 
in  a  plausible  branch  as  it  is  generated.  After  it  is  evaluated,  the  backed- 
up  values  of  nodes  above  it  in  the  tree  are  changed  accordingly,  using 
the  alpha-beta  technique.  This  may  cause  some  nodes  to  be  pruned 
from  further  consideration,  or  change  the  plausibility  orderings  of  other 
{dynamic  ordering),  or  indicate  that  new  plausible  branches 
should  be  generated.  (A  formalization  for  dynamic  search  procedures  is 
given  in  Nilsson,  1971)  .  A  game-playing  program  using  dynamic  search 
may  approach  a  reduction  of  the  branching  factor  of  the  complete 
game  tree  from  5  to  (note  4-7). 


CHECKERS 
Checker  Player 

Samuel  (1959,1967)  wrote  a  computer  program  capable  of  play¬ 
ing  Checkers  at  a  championship  level.  The  program  is  capable  of  beat¬ 
ing  all  but  the  very  best  players,  and  once  beat  a  Checkers  master, 
Robert  W.  Nealey  (see  Fig.  4-7).  This  section  discusses  Samuel’s 
Checkers  Player. 

Samuel’s  program  conducts  an  alpha-beta  tree  search,  using  for¬ 
ward  pruning.  To  insure  the  effectiveness  of  the  alpha-beta  technique, 
the  Checkers  Player  does  a  shallow,  breadth-first  search  to  order  al¬ 
ternatives  according  to  plausibility.  The  overall  tree  search  terminates 
whenever  a  node  is  at  a  maximum  depth  and  the  program  judges  it  to 
be  “dead”  (i.e.,  there  are  no  immediate  jumps  available).  During  a 
game  it  usually  takes  the  Checkers  Player  less  than  a  minute  to  perform 
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Figure  4-7.  One  of  the  program’s  early  victories.  (Samuel,  1959,  1967.) 

its  tree  search  and  decide  how  it  will  move.  Samuel’s  program  is  unique 
in  that  it  is,  to  some  extent,  capable  of  developing  its  own  static- 
evaluation  function.  The  Checkers  Player  is  capable  of  using  and  select¬ 
ing  a  static-evaluation  function  that  is  a  composite  function  of  a  set  of 
parametric  junctions  (this  is  described  below).  Together  with  the  con¬ 
cepts  presented  in  the  preceding  section,  this  description  is  sufficient 
to  show  how  Samuel’s  program  plays  the  game,  once  it  has  a  good 
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static-evaluation  function.  So,  this  section  explains  how  the  Checkers 
Player  is  able  to  achieve  its  evaluation  function. 

Samuel  had  three  basic  problems  in  constructing  a  good  evaluation 
function.  He  had  to  determine  the  proper  set  of  paranietric  functions; 
the  proper  type  of  composite  evaluation  function;  and  how  the  pro¬ 
gram’s  experience  should  influence  it  to  modify  its  evaluation  function. 

The  solutions  for  each  of  these  three  problems  involved  a  con¬ 
siderable  amount  of  heuristic  programming.  From  the  standpoint  of 
bona  fide  machine  learning,  perhaps  the  most  important  thing  would 
be  to  enable  the  program  to  create  its  own  set  of  parametric  functions, 
since  these  functions  are  an  inherent  limitation  on  its  ability  to  play  the 
game  and  since  their  specification  by  an  outside  source  is  a  substantial 
hint  as  to  the  proper  way  of  playing.  The  Checkers  Player  is  not  pro¬ 
grammed  to  do  this :  All  parametric  functions  are  supplied  in  advance  to 
its  operation,  and  are  carefully  chosen  for  their  relevance  to  the  game. 
(A  typical  parameter  is  mob  (total  mobility),  equal  to  the  number  of 
squares  to  which  the  Player  can  potentially  move,  disregarding  forced 
jumps.) 

The  second  problem,  determining  the  proper  type  of  composite 
evaluation  function,  has  been  approached  in  two  ways  in  different  ver¬ 
sions  of  the  Checkers  Player.  The  original  approach  was  to  let  the 
evaluation  function  be  a  polynomial  of  the  form  ayti  +  .  .  .  +  aJn!  that 
is,  a  weighted  sum  of  the  values  of  the  terms  U.  (For  example,  3?i  +  5^2 
is  a  polynomial  function  of  h  and  ^2:  If  “  4  and  t2  =  7,  then  the 
function  has  a  value  of3x4  +  5x7  =  12-H  35  =  47.)  The  greater 
the  value  of  the  polynomial,  the  more  favorable  one’s  evaluation  of  the 
configuration  in  question.  This  approach  has  the  advantages  of  per¬ 
mitting  an  easy  modification  of  the  function,  obtained  by  changing  the 
weights  Ui.  The  disadvantage  comes  from  the  linear  nature  of  the  poly¬ 
nomial  and  the  fact  that  it  is  not  really  plausible  to  assume  that  the 
theoretical  evaluation  function  can  be  linearly  expressed  in  terms  of  the 
given  parametric  functions  ti.  Samuel’s  original  program  overcame  this 
to  some  extent  with  the  use  of  two  techniques:  First,  it  was  made  pos¬ 
sible  to  introduce  new  terms  that  were  binary  connectives  of  the  previous 
ones  (i.e.,  terms  that  corresponded  to  logical  expressions  of  the  form 
(li  A  tj,  ^ti  V  tjt  etc.).  A  second,  more  recent  technique  was  to  divide 
the  course  of  the  game  into  six  successive  phases  (determined  primarily 
by  the  number  of  pieces  on  the  board);  in  each  phase  a  different  poly¬ 
nomial  could  be  used  (one  with  a  different  set  of  terms  and  different 
coefficients  for  each  term). 

Another,  more  direct  method  of  constructing  a  nonlinear  evalua- 
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tion  function  has  been  investigated  more  recently,  and  is  the  one 
currently  used  by  the  Checkers  Player.  The  method  consists  of  con¬ 
structing  a  hierarchy  of  “signature  tables”  as  follows:  First,  the  pos¬ 
sible  values  of  the  parametric  functions  are  restricted;  that  is,  some 
parameters  are  allowed  to  have  only  five  values  (  —  2,  — 1,0, 1,2)  and 
the  rest  are  allowed  to  have  only  three  values  (  —  1,0,1).  Next,  six 
collections  (called  signature  types)  of  parameters  are  chosen.  Each 
signature  type  contains  four  elements,  of  which  one  is  a  five-value 
parameter  and  the  rest  are  three-value  parameters.  (Some  parameters 
may  be  included  in  more  than  one  signature  type.) 

For  each  signature  type,  a  signature  table  is  to  be  constructed; 
this  table  lists  an  evaluation  (either  —2,— 1,0,1,  or  2)  for  every 
combination  of  values  of  the  four  elements.  There  are  thus  125  entries 
in  each  signature  table,  every  entry  being  —2,— 1,0,1,  or  2.  (Actually, 
it  is  only  necessary  to  include  63  entries  in  a  given  signature  table,  since 
the  parametric  functions  are  designed  to  be  “symmetric”  for  each  of  the 
players.  If  (1,2,— 1,0)  is  listed  in  a  given  signature  table  as  having 
an  evaluation  of  2,  the  evaluation  of  (—1,-2, 1,0)  is  automatically 
determined  to  be  —2,  and  it  is  not  necessary  to  list  it  in  the  table.) 

To  build  the  hierarchy,  two  second-level  signature  tables  are  con¬ 
structed  as  in  Fig.  4-8,  each  of  which  has  a  second-level  evaluation  (an 
integer  from  —7  to  7)  for  all  possible  combinations  of  values  of  the 
three  first-level  tables  it  describes.  There  are  thus  125  entries  in  each 
of  the  second-level  tables. 

Finally,  a  third-level  table  assigns  an  evaluation  to  each  possible 
combination  of  values  from  the  second-level  table. 

To  determine  the  evaluation  of  a  given  board-configuration,  it  is 
necessary  to: 

1.  Determine  the  values  of  each  of  the  parametric  functions 
for  that  particular  configuration. 

2.  Look  in  the  six  first-level  signature  tables  and  find  the  first- 
level  evaluations  of  the  configuration. 

3.  Look  in  the  two  second-level  tables  and  find  the  second-level 
evaluations  of  the  configuration. 

4.  Obtain  the  final  evaluation  by  looking  in  the  third-level  signa¬ 
ture  table. 

As  a  further  improvement  on  the  quality  of  these  evaluations, 
a  different  signature-table  hierarchy  is  used  for  each  of  the  six  phases 
of  the  game. 
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Figure  4-8.  The  signature-table  hierarchy.  (Samuel,  1967.) 
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Learning 

The  remaining  question  is  how  these  evaluation  functions,  whether 
polynomials  or  signature  tables,  are  to  be  obtained  from  the  game¬ 
playing  experience  of  the  program.  The  method  for  developing  these 
functions  constitutes  the  “learning  ability”  of  the  program. 

Work  on  the  Checkers  Player  has  been  primarily  devoted  to  two 
ways  of  doing  this,  referred  to  as  rote  learning  and  learning  by  gen¬ 
eralization.  Rote  learning  can  be  accomplished  by  establishing  a  large 
file  of  those  board  configurations  and  their  evaluations  that  are  en¬ 
countered  during  the  course  of  the  games  the  program  plays.  The 
establishment  of  this  file  eliminates  the  need  to  recompute  an  evaluation 
each  time  such  a  configuration  arises,  so  it  has  the  benefit  of  increasing 
the  efficiency  of  the  program  (provided  the  search  time  through  the 
file  is  kept  low).  Learning  is  effected  as  follows:  If  the  program  is 
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presented  with  an  arbitrary  board  configuration  and  asked  to  determine 
the  correct  choice  for  the  next  move,  it  will  often  find  in  the  file  some 
configurations  that  are  descendants  of  the  configuration  in  question.  The 
evaluation  for  these  descendants,  before  they  were  put  in  the  file,  was 
jDriginally  made  in  terms  of  their  descendants,  which  might  ordinarily 
be  too  far  away  from  the  original  move  to  be  investigated.  The  evalua¬ 
tion  of  the  various  alternatives  for  the  move  can  thus  be  much  deeper 
than  the  normaT limits  on  computation  would  allow  (see  Fig.  4—9). 
The  rote-learning  method  was  particularly  good  at  developing  the 
Checker  Player’s  opening  and  end  games. 

Learning  by  generalization  is  the  technique  that  does  most  of  the 
work  in  constructing  the  evaluation  function,  however.  In  the  case  of  the 
polynomial  type  of  evaluation  function,  the  basic  process  is  that  of 
changing  the  coefficients  for  the  various  terms,  whereas  in  the  case  of 
the  signature-table  function,  the  process  is  that  of  changing  the  various 
entries  in  the  tables.  These  processes  are  accomplished  in  different 
manners,  depending  on  the  use  of  particular  learning  situations  to  which 
the  program  can  be  subjected. 

Learning  Situations  for  Generalization 

The  earliest  generalization  situations  for  the  Checker  Player  were 
those  involving  actual  play  of  the  game,  in  which  the  program  was  either 
employed  against  human  opponents  or  played  against  itself.  These 
situations  were  used  mainly  for  the  development  of  good  evaluation 
functions  of  the  polynomial  type.  In  either  case,  two  Checker-playing 
programs  were  available,  called  Alpha  and  Beta  (not  to  be  confused 
with  the  alpha-beta  technique).  Alpha  generalized  on  its  learning  ex¬ 
perience  after  each  move  and  would  change  its  coefficients  correspond¬ 
ingly,  while  the  polynomial  evaluation  function  for  Beta  was  kept  con¬ 
stant  throughout  any  given  name.  Alpha  was  the  program  used  against 
human  opponents;  the  condition  of  self-playing  was  effected  by  playing 
Alpha  against  Beta,  generally  in  a  sequence  of  games,  with  the  stipula¬ 
tion  being  that  if  Alpha  won  a  game  its  polynomial  would  be  used  in 
the  next  game  by  Beta  also,  while  if  Alpha  lost  too  many  games  in  a 
row  its  polynomial  would  suffer  some  large,  random  change.  The 
purpose  of  the  change  was  to  start  the  game  off  in  a  new  direction  and 
(hopefully)  permit  the  development  of  a  completely  new  polynomial. 

Alpha  changed  its  polynomial  as  follows:  At  each  move,  Alpha 
would  compute  the  evaluation  of  the  current  board  position  as  deter¬ 
mined  by  its  polynomial.  It  would  also  compute  a  backed-up  evaluation 
of  the  current  board  position,  determined  by  looking  ahead  in  the  game 
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tree  and  minimaxing  backward  from  the  tips  of  the  tree,  as  defined  in 
the  preceding  section. 

At  any  rate,  given  the  evaluation,  immediate  evaluation,  and  the 
backed-up  evaluation,  Alpha  would  adjust  the  coefficients  of  its 
evaluation  function  so  as  to  make  its  new  immediate  evaluation  of  the 
configuration  closer  to  that  it  had  obtained  by  the  look-ahead  method. 

The  success  achieved  by  this  technique  of  “learning  while  playing” 
was  significant,  although  somewhat  time-consuming.  It  was  particularly 
good  at  developing  the  middle-game  performance. 

Book  Learning 

In  the  normal  operation  of  the  Checkers  Player,  time  spans  on 
the  order  of  a  minute  are  required  for  it  to  make  the  choice  of  a  move. 
This  results  in  a  great  deal  of  time  consumption  and  makes  it  desirable 
that  a  faster  method  than  “learning  while  playing”  be  found  to  ac¬ 
complish  the  learning  process. 

The  generalization  technique  was  therefore  explored  in  a  third 
learning  situation,  referred  to  as  book  learning.  Approximately  250,000 
different  board  configurations,  together  with  the  moves  recommended 
for  thern,  were  transcribed  from  the  Checkers  literature  and  stored  on 
magnetic  tape,  and  the  program  was  structured  so  as  to  learn  under  their 
guidance.  This  learning  situation  was  used  for  the  development  of  both 
the  signature-table  evaluation  function  and  the  polynomial  evaluation 
function. 

The  procedures  in  both  cases  were  similar:  Given  a  particular 
board  configuration,  the  program  would  look  at  the  various  alternatives 
for  the  move  and  store  each  of  their  resultant  configurations.  One  of 
the  alternatives  would  be  the  book-recommended  choice. 

Next,  in  the  case  of  the  polynomial  function,  a  table  would  be 
formed,  listing  each  such  resultant  configuration  against  the  values  of 
each  of  the  parametric  functions  when  applied  to  it.  Using  the  table, 
a  count  was  made  of  the  number  of  configurations  for  which  a  given 
parametric  function  had  a  value  higher  than  it  had  for  the  book- 
recommended  configuration;  also  counted  were  the  number  of  con¬ 
figurations  for  which  it  had  a  value  lower  than  that  for  the  book- 
recommended  configuration.  These  numbers  were  added  to  the  cumula¬ 
tive  totals  H  and  L  of  that  particular  parameter  for  all  the  configurations 
that  the  program  had  so  far  considered;  a  coefficient  C  for  that  param¬ 
eter  was  defined  to  be  the  ratio  —  +  This  was  the 

coefficient  associated  with  the  parameter  in  the  polynomial  evaluation 
function. 
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Roughly,  the  same  thing  was  done  in  the  development  of  the 
signature  tables.  These  tables  listed  each  resultant  configuration  against 
its  values  with  respect  to  each  of  the  signature  types  (i.e.,  against  its 
signatures),  and  cumulative  totals  D  and  ^  were  accumulated  for  each 
signature  with  respect  to  all  the  board  configurations  so  far  considered, 
using  the  rule  that  D  was  increased  by  one  for  each  signature  of  an 
alternative  not  recommended  by  the  book,  while  n  (the  total  number 
of  nonbook  moves)  was  added  to  the  ^  total  for  each  signature  that 
corresponded  to  a  book-recommended  move.  The  correlation  coefficient 
for  a  given  signature,  defined  as  C  -  {A  -  D)/{A  4-  D),  was  used  as 
the  entry  for  the  signatures  that  occurred  in  the  third-level  table  and, 
if  the  signature  occurred  in  a  lower  level,  it  was  adjusted  to  fit  the  values 
possible  there. 

Results 

The  coefficient  C  for  a  given  parameter  (or  signature)  serves  as 
cumulative  measure  of  the  goodness  of  the  parameter  in  predicting 
the  book  move.  The  book-learning  technique  worked  well,  especially 
for  the  signature-table  type  of  evaluation  function.  After  analyzing 
approximately  175,000  board  situations,  the  Checkers  Player  was  able 
to  predict  book-recommended  moves  with  an  accuracy  of  48%, 
simply  on  the  basis  of  its  evaluation  function,  without  doing  any  tree 
searching.  In  actual  play  the  program  follows  book-recommended  moves 
to  a  much  greater  extent  because  it  uses  tree-searching  techniques. 

These,  then,  were  the  fundamental  heuristics  behind  the  Checkers 
Player’s  approach  to  learning  the  game.  Samuel’s  Checkers  Player  was 
one  of  the  earliest  major  successes  of  AI  research,  being  the  first 
computer  program  to  perform  at  a  championship  level  in  a  difficult 
game  of  strategy.  The  program  improved  to  the  point  where  it  could 
beat  its  own  designer.  It  remains  today  one  of  the  best  achievements 
in  game-playing  programs. 


CHESS  AND  GO 
Chess 

Shannon  (1950  a,  b)  was  one  of  the  first  to  point  out  the  im¬ 
possibility  of  using  exhaustive  search  to  play  Chess,  and  suggested  that 
a  terminating  tree  search  should  be  used.  Turing  (1953)  described  a 
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simple  Chess  program  and  suggested  that  termination  of  the  tree  search 
should  be  governed  by  whether  or  not  the  positions  ultimately  reached 
were  “dead.”  (Turing  defined  a  dead  position  to  be  one  in  which  there 
were  no  immediate  captures  available.)  Since  then,  Chess  programs 
have  been  written  by  Gillogly,  Bernstein,  Bastian,  Newell,  McCarthy, 
and  others.  An  article  by  Good  (1968)  describes  a  “Five-Year  Plan” 
for  the  development  of  an  expert  Chess-playing  program.  Some  of  the 
ideas  mentioned  have  been  implemented,  though  many  deserve  further 
investigation.  One  of  the  best  Chess-playing  programs  to  date  is  that 
of  Greenblatt,  Eastlake,  and  Crocker  (1967);  it  is  usually  referred  to 
either  as  the  “Greenblatt  Chess  Program”  or  as  “Mac  Hack  Six.” 

In  order  to  describe  Greenblatt’s  program,  some  of  the  customary 
terminology  used  by  Chess  players  is  adopted:  We  shall  refer  to  each 
of  the  various  alternatives  for  moving  pieces  on  a  chessboard  that  a 
player  can  legally  use  in  one  turn  as  being  Chess  moves  or,  more  simply, 
moves.  In  all  other  sections  of  this  chapter  the  word  “move”  has  been 
used  in  its  (von  Neumann-Morgenstern,  1944)  game-theoretic  sense,  to 
denote  a  situation  in  which  a  player  can  choose  among  alternatives. 

The  tree  search  done  by  Greenblatt’s  Chess  Player  program  is 
rather  sophisticated,  but  it  can  be  explained  within  the  state-space 
paradigm;  The  possible  board  configurations,  together  with  the  Chess 
moves  that  allow  one  to  go  from  one  configuration  to  another,  are  the 
state  space  of  Chess.  Greenblatt’s  program  utilizes  heuristic  information 
in  evaluating  both  the  states  and  the  operators  of  the  state  space.  When 
presented  with  an  initial  board  configuration^ .  the  program  employs  a 
plausible-move  generator  to  enumerate  legal  Chess  moves  (operators) 
possible  from  that  configuration  and  to  estimate  the  desirability  of  each 
move. 

The  plausible-move  generator  incorporates  a  large  amount  of 
heuristic  information  in  the  way  it  evaluates  a  given  move.  Basically, 
however,  its  evaluation  of  a  move  is  a  comparison  of  the  positions  and 
pieces  attacked  before  the  move,  to  those  attacked  after  the  move. 
Gains  or  losses  resulting  from  blocking  or  unblocking  pieces  are  taken 
into  account,  and  factors  are  added  to  increase  the  evaluated  plausibility 
of  moves  that  attack  certain  weak  spots  (for  example,  pinned  pieces). 
The  evaluation  also  incorporates  very  specific  heuristic  information, 
such  as:  “It  is  bad  to  move  pieces  in  front  of  center  pawns  on  their 
original  squares.” 

The  moves  are  ordered  according  to  the  score  they  receive  from 
the  plausible-move  generator,  and  some  of  them  are  selected  for 
further  consideration.  The  board  configuration  resulting  from  the  first 
of  these  moves  is  calculated  and  the  plausible-move  generator  is  ap- 
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plied  to  it;  the  process  continues  to  a  preset  depth,  at  which  point  an 
evaluation  function  is  applied  to  the  resultant  board  configuration.  If 
there  are  many  pieces  in  danger  {en  prise)  ^  the  plausible-move  genera¬ 
tor  is  applied  again  and  the  analysis  is  carried  down  another  level  of  the 
tree.  Otherwise,  the  evaluation  function  returns  a  value  for  the  configu¬ 
ration,  dependent  upon  a  comparison  of  the  pieces  held  by  each  of  the 
players,  how  much  their  pieces  have  changed  since  the  initial  configura¬ 
tion,  the  presence  or  absence  of  certain  “pawn  structures,”  the  safety  of 
the  kings,  the  extent  to  which  the  two  sides  control  the  center,  and  the 
number  of  plausible  captures  that  can  be  made  from  the  position. 
(Plausible  captures  are  investigated  in  a  manner  similar  to  that  for 
plausible  moves.) 

Thus,  the  tree  search  of  Greenblatt’s  program  terminates  at  a 
depth  dependent  upon  the  configurations  themselves  and  the  extent  to 
which  there  are  or  are  not  pieces  en  prise  (see  Turing’s  “dead”  position 
idea,  described  in  the  section  “Generating  Game  Trees”).  Similarly,  the 
width  of  the  tree  search  is  tapered  (see  the  second  section  of  this 
chapter)  so  that  at  successive  levels  of  the  tree  the  number  of  plausible 
moves  from  each  configuration  considered  for  further  investigation  is 
15,15,9,9,7, ...  (all  levels  below  the  fifth  have  a  branching  factor 
of  7).^  However,  the  width  at  any  level  can  be  expanded  if  there  is 
heuristic  information  that  an  important  move  (a  check,  for  example) 
is  being  ignored.  The  alpha-beta  technique  is  used  throughout  the 
generation  of  the  game  tree  so  that  the  investigation  of  many  plausible 
moves  is  obviated.  (It  is  estimated  that  the  use  of  the  alpha-beta  tech¬ 
nique  reduces  the  amount  of  computation  by  a  factor  of  100.)  Also,  the 
program  avoids  considering  the  same  board  configuration  ■  twice  by 
maintaining  a  table  of  those  configurations  it  has  already  encountered 
and  evaluated.  Finally,  the  program  contains  a  table  of  “book  openings,” 
which  provides  it  with  the  moves  recommended  by  human  experts  for 
board  configurations  that  often  occur  during  the  beginnings  of  Chess 
games. 

In  1967  the  program  was  given  a  tournament  rating  of  about  1,400. 
(The  mean  of  all  United  States  tournament  players  is  about  1,800;  the 
mean  of  all  Chess  players,  about  900.)  In  April  1967,  the  program 
won  the  Massachusetts  Class  D  amateur  trophy.  The  program  has  been 
continually  improved  and  at  present  wins  at  least  80%  of  its  games 
against  nontournament  players.  In  1969,  Good  estimated  that  the  pro¬ 
gram  would  play  about  2,000  in  England.  The  program  is  an  honorary 


^  During  nontournament  play  the  program  typically  expands  its  plausible 
game  tree  with  a  constant  branching  factor  of  6. 
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member  of  the  United  States  Chess  Federation,  under  the  name  of  Mac 
Hack  Six.  Figure  4-10  shows  Mac  Hack  Six  winning  the  first  game 
of  tournament  Chess  to  be  won  by  a  computer. 

Mac  Hack  Six  is  not  a  “learning”  program  in  the  sense  of  Samuel’s 
Checker  Player.  It  is,  however,  one  of  the  “skillful”  programs  so  far 
produced  by  Ai  research  (see  Chapter  3).  The  level  of  skill  of  Green- 
blatt’s  program,  relative  to  that  attainable  by  humans,  is  probably  not 
as  great  as  that  attained  by  Samuel’s  Checkers  Player  or  Feigenbaum’s 
et  al.  (1971)  dendral,  but  it  is  still  considerable — with  more  develop¬ 
ment,  Mac  Hack  Six  may  reach  the  master  tournament  level. 


White  is  Mac  Hack  Six;  black  Is  a  human  rated  1510 
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Figure  4-10.  First  game  won  by  computer  in  tournament  competition: 
Game  3,  Tournament  2,  Massachusetts  State  Championship,  1967. 

(Greenblatt  et  a!.,  1967,  reprinted  with  permission.) 

The  Game  of  GO 

Of  all  the  various  perfect-information  board  games  described 
previously,  GO  is  probably  the  most  difficult  (see  the  second  section  of 
this  chapter)  .  No  really  successful  GO-playing  program  has  yet  appeared. 
However,  Thorp  and  Walden  (1970)  investigated  some  of  the  logical 
aspects  of  the  game,  Zobrist  (1969)  described  a  program  that  plays  a 
legal  game  and  has  “reached  the  bottom  rung  of  the  ladder  of  human 
GO  players,”  and  Ryder  (1971)  described  a  program  that  uses  heuristic 
search  techniques  to  play  a  “fair  beginner’s”  game. 

The  rules  of  go  are  fairly  simple  to  state:  The  game  is  played  on  a 
19x19  board  (see  Fig.  4-11)  between  two  players,  each  of  whom 
has  an  unlimited  number  of  stones,  the  stones  of  one  player  being  white 
and  those  of  the  other  being  black.  The  players  alternate  in  making 
moves.  In  a  given  move,  a  player  may  place  a  stone  on  any  unoccupied 
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Figure  4-11.  An  illustration  of  GO.  (Courtesy  of  E.  Fiala  and  H.  E. 

Sturgis,  Xerox  Palo  Alto  Research  Center.) 

intersection  of  the  board  (subject  to  two  restrictions,  described  below) 
of  he  may  pass.  The  game  is  over  if  the  two  players  pass  in  succession. 
Stones  of  the  same  color  which  form  a  connected  string  lying  along  a 
row  or  a  column  of  the  board  are  said  to  form  a  chain.  TTie  breathing 
spaces  of  a  chain  are  the^mpty  intersections  adjacent  (by  row  or 
column  adjacency;  diagonal  adjacency  is  not  sufficient)  to  the  chain. 
When  a  player  places  a  stone  on  the  board,  he  may  not  form  a  chain 
without  breathing  spaces,  unless  he  is  capturing.  He  may  not  capture  a 
stone  that  has  captured  one  of  his  stones  on  the  preceding  turn,  unless 
he  also  captures  one  or  more  additional  stones.  Otherwise,  if  a  chain  has 
no  breathing  spaces,  it  is  captured  by  the  opponent.  At  the  end  of  the 
game  a  player’s  payment  is  equal  to  the  sum  of  the  intersections  sur¬ 
rounded  by  his  stones  plus  the  number  of  his  opponent’s  stones  that 
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he  has  captured.  The  technique  in  capturing  stones  is  to  maneuver  so 
that  the  chains  of  one’s  opponent  have  no  breathing  spaces. 

Zobrist’s  program  uses  pattern-recognition  techniques  (see  the 
next  chapter)  to  aid  its  investigation  of  a  given  go  board  configuration. 
It  possesses  85  “templates,”  which  are  capable  of  matching  configura¬ 
tions  of  stones  already  on  the  board,  and  either  suggest  places  for  the 
program  to  place  its  stone  or  suggest  areas  in  which  the  program  should 
conduct  a  limited  look-ahead.  When  the  program  does  look  ahead,  it 
does  not  perform  an  extensive  tree  search. 

Ryder’s  program  represents  a  departure  from  the  strict  alpha- 
beta,  heuristic  tree-search  techniques  that  have  worked  so  well  for 
Checkers  and  Chess,  and  comprises  a  unification  with  recent  develop¬ 
ments  in  pattern  recognition  and  problem  solving.  At  least  two  aspects 
of  its  operation  are  significant:  First,  it  is  designed  to  recognize  re¬ 
cursively  defined  features  of  configurations  of  stones  on  the  board. 
Second,  the  program  is  a  goal-oriented  plan  for  playing  go:  It  is  capable 
of  establishing  and  rejecting  limited  goals  (e.g.,  “target  captures”)  and 
of  searching  for  move  sequences  (“tactics”)  that  will  lead  to  them.  The 
recognition  of  recursively  defined  patterns  has  been  investigated  by 
Morofsky  and  Wong  (1971),  and  by  Hewitt  (1968  et  seq.). 

GO  is  an  extremely  difficult  game  to  play.  It  may  be  several  years 
before  a  program  can  be  written  that  will  be  “skillful”  at  playing  the 
game,  even  at  an  amateur  level  comparable  to  the  current  Greenblatt 
Chess-playing  program. 


POKER  AND  MACHINE  DEVELOPMENT 
OF  HEURISTICS 

Waterman  (1968)  designed  a  language  in  which  heuristics  for 
Draw  Poker  could  be  expressed  as  sentences,  and  he  attempted  to  con¬ 
struct  a  program  that  could  select  the  appropriate  sentences  under  the 
guidance  of  experience.  Waterman’s  Poker-playing  program,  though 
perhaps  not  as  well  known  as  other  game-playing  programs,  is  one  of 
the  few  such  programs  to  differ  significantly  in  its  approach  from  the 
Checkers  Player. 

He  distinguished  between  two  types  of  heuristic,  heuristic  rules  and 
heuristic  definitions,  A  heuristic  rule  specifies  an  action  to  be  taken  and 
the  type  of  situation  that  prompts  taking  the  action.  Heuristic  definitions 
define  terms  that  may  occur  in  the  statement  of  other  heuristic  rules 
or  definitions.  A  heuristic  rule  in  Poker,  for  example,  could  be  a  state¬ 
ment  such  as:  “If  the  pot  is  high,  call”;  the  term  high  could  be  defined 
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with  the  use  of  a  heuristic  definition  such  as  “the  pot  is  high  if  it  is 
greater  than  or  equal  to  B/*  and  the  term  B  also  could  be  defined  by  a 
heuristic  definition  like  *‘B  equals  1000.’’ 

Waterman’s  program  works  within  the  state-space  paradigm  for 
the  statement  of  problems.  Poker  states  (the  “hand”  one  holds,  the  bids 
that  have  been  made,  etc.)  are  described  by  vectors.  Given  an  input 
state-vector  v  =  Vi, .  . .  the  problem  for  the  Poker-player  program  is 
to  decide  upon  an  output  state-vector  that  is  both  legal  according  to 
the  rules  of  Poker  and  desirable  from  the  program’s  standpoint  of  try¬ 
ing  to  do  well  in  the  game,  (Thus,  a  legal  output  state-vector  may  in¬ 
clude  a  change  in  the  program’s  current  bid.)  The  Poker-playing  pro¬ 
gram  develops  an  ordered  list  of  heuristic  rules  and  definitions,  which 
we  shall  call  a  heuristic  block,  that  specifies  an  output  state-vector  for 
each  input  state-vector.  (Waterman’s  program  is  an  example  of  a  pro¬ 
gram  that  develops  subprograms.  We  discuss  various  aspects  of  this 
subject  in  Chapters  6  and  7.) 

In  Waterman’s  Poker  player  the  general  expression  for  a  heuristic 
rule  is  of  the  form 

(Fi, . .  .  ,Kn)“^(/i(v), . .  .  ,/n(v) ) 

where  each  Vi  represents  a  set  of  values  for  corresponding  variable  v^. 
Essentially,  such  an  expression  says:  “Whenever  the  state  vector  v  is 
such  that  vi  is  a  member  of  the  set  Fi, . . . ,  and  Vn  is  a  member  of  the 
set  Vn,  the  resultant  state  vector  v'  is  defined  to  be  v'  =  (/i(v), .  . . , 
/»(f)).”  a  heuristic  definition  is  either  an  expression  of  the  form 
A  —  2,”  which  means  that  an  element  a  is  considered  a 
member  of  the  set  A1  if  it  belongs  to  the  set  A  and  if  a  =  2,  or  an 
expression  such  as  +  F,”  which  means  that  X  is  defined  by  the 

sum  of  Kl  and  F. 

The  first  step  in  executing  a  heuristic  block  is  to  compare 
the  input  state-vector  with  all  the  heuristic  definitions  until  the  most 
general  description  possible  (in  the  heuristic  terms  that  have  been  de¬ 
fined)  of  the  state  vector  is  obtained;  this  description  can  now  be 
matched  against  the  left-hand  sides  of  the  heuristic  rules.  The  con¬ 
vention  is  adopted  that  the  description  of  the  state  vector  is  to  be  com¬ 
pared  with  heuristic  rules,  in  order  from  the  top  down,  until  a  match 
is  made,  at  which  point  the  appropriate  action  is  taken. 

To  illustrate,  suppose  the  input  state-vector  is  (3,2,4)  and  the 
heuristic  block  is  as  follows  (where  the  asterisk  means  that  any  value 
is  acceptable) : 
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Heuristic 

Rules 


V2,*) 

Heuristic 

Definitions 


Al^A,A>^ 
A2-^A,  A<A 
B>2 
B2-^B,  B<3 
Cl-^C,  C=5 


X->Kl  XD 

A~^a,a  a  member  of  (1,2,  . . .) 
B-^b,b  a  member  of  (1,2, , . 
C-^c,c  a  member  of  (1,2, . . .) 


Comparison  of  the  input  with  the  heuristic  definitions  yields 
(A2,{Bl,B2),]C)  as  the  description  of  the  state  vector  (which  is  to  be 
read:  “the  input  is  in  the  situation  A2  either  51  or  52,  and  C”).  This 
description  is  compared  with  the  left-hand  sides  of  the  heuristic  rules; 
the  first  rule  it  is  found  to  match  is  (*,52,*)-^(*,*,Vi  -h  3),  so  the 
output  vector  is  (3,2,6).  (Provisions  can  be  made  to  establish  con¬ 
stants  that  are  fixed  within  the  system,  such  as  ^Cl,  or  to  allow  variables 
and  constants  that  can  be  updated,  such  as  Z  or  D.) 

Given  this  framework  for  the  description  and  implementation  of 
heuristics,  essentially  four  operations  can  be  applied  to  a  heuristic 
block  to  produce  a  new  block. 

First,  a  given  heuristic  rule  can  be  modified  to  match  a  vector  v  by 
enlarging  some  of  the  sets  in  the  left-hand  part  of  the  rule  expres¬ 
sion.  Second,  such  a  rule  can  be  modified  by  making  one  or  more 
variables  irrelevant  (introducing  an  asterisk  in  the  left-hand  part  of  the 
expression),  again  to  insure  that  it  matches  a  given  vector  v.  Third,  if  a 
rule  is  found  to  cause  an  error  (i.e.,  if  experience  should  indicate  that 
there  are  situations  for  which  it  prescribes  a  wrong  action),  it  can  be 
modified  so  as  to  not  match  a  given  vector  (vi, . . ,  ,v„)  and  a  rule  be¬ 
low  it  can  be  modified  to  match  it  (in  both  cases  by  altering  sets  Vi 
in  the  left-hand  part  of  their  expressions).  Finally,  an  error-causing 
rule  can  be  overridden  by  inserting  a  new  heuristic  rule  directly  above 
it  in  the  heuristic  block. 

Before  these  operations  can  be  applied,  some  questions  need  to 
be  answered.  The  most  obvious  question  is,  “What  is  an  acceptable  out¬ 
put  vector  for  the  given  input?”  Two  others  are,  “What  sets  are  rele¬ 
vant?”  and  “How  should  they  be  changed?”  For  example,  if  one  is 
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given  the  information  that  (4,2,4)  is  an  acceptable  output  vector,  that 
Cl  is  a  relevant  set,  and  that  Cl  should  be  made  to  include  more  values, 
then  one  can  determine  that  ''C1*->C,C=:5”  is  the  heuristic  which 
should  be  changed  and  that  ''C1--»C,C>4”  is  a  heuristic  definition 
which  can  be  substituted  in  its  place.  In  this  example  there  is  nothing 
further  to  be  done:  The  input  vector  (3,2,4)  is  now  represented  as 
and  this  symbolic  description  is  matched  by  the 
second  heuristic  rule, 

(^2,^Cl)^(Vl+l,^*) 

with  the  result  that  (4,2,4)  is  the  output  vector. 

How  is  the  program  to  extract  from  its  experience  the  answers  to 
these  questions?  It  is  possible  to  supply  this  information  from  the  out¬ 
side,  in  which  case  one  might  say  the  program  is  being  trained;  Water¬ 
man  investigated  this  approach  and  achieved  a  Poker-playing  program 
that  could  play  a  better-than-average  game.  (See  Table  4-1  for  the  rules 
used  by  Poker  Player.)  Waterman  also  investigated  ways  the  program 
could  infer  the  necessary  information  on  its  own,  although  his  approach 
did  not  completely  free  the  program  from  dependence  on  outside  help. 
He  was  able  to  structure  the  program  so  that  it  could  solve  the  first 
two  questions  and  then,  with  the  aid  of  a  decision  matrix  given  to  it  by 
the  programmer,  solve  the  third  question.  (Waterman’s  use  of  a  decision 
matrix  parallels  Newell  and  Simon’s  use  of  a  “difference  table”  in  gps.) 
Given  this  decision  matrix,  the  program  was  capable  of  “learning”  to 
play  a  fair  game  of  Draw  Poker,  although  its  success  at  learning  poker 
was  not  nearly  as  dramatic  as  the  success  of  Samuel’s  program  at  learn¬ 
ing  checkers.  Waterman’s  program  is  distinct  from  the  game-playing 
programs  discussed  in  previous  sections  in  that  it  plays  a  game  of 
“imperfect  information.” 

The  problem  of  designing  a  program  that  develops  its  own  heuris¬ 
tics  is  still  unsolved.  Again,  perhaps  the  only  thing  clear  is  that  such  a 
progfanl  would  have  to  be  guided  by  heuristics,  and  that  it  will  eventu¬ 
ally  be  necessary  for  ai  researchers  to  consider  the  nature  of  heuristics 
that  develop  heuristics. 


BRIDGE 

A  recent  program  by  Wasserman  (1970b)  is  capable  of  bidding 
skillfully  in  the  game  of  Contract  Bridge.  Bridge  bidding  is  a  significant 
intellectual  task,  involving  imperfect  inforrnation  and  requiring  an 
ability  to  work  and  communicate  with  a  partner.  Wasserman’s  program 
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TABLE  4-1.  Rules  for  Poker  Used  by  Waterman’s  Program 


Definitions  of  State-Vector  Variables  and  Symbolic  Values 


VDHAND: 

POT: 

LASTBET: 

BLUFFO: 

POTBET: 

ORP: 

OSTYLE: 

OH: 

OB: 

CS: 

BO: 

LAP: 

SB: 

MB: 

BB: 

BBS: 

BBL: 

OAVGBET: 

OTBET: 
OBLUFFS: 
OCORREL: 
OD: 
SW: 
EC: 
GC: 
PC: 
NC: 
Kl  to  K31: 


the  value  of  your  hand 

the  amount  of  money  in  the  pot 

the  amount  of  money  last  bet 

a  measure  of  the  probability  that  the  opponent  can  be  bluffed 
the  ratio  of  the  money  in  the  pot  to  the  amount  last  bet 
the  number  of  cards  replaced  by  the  opponent 
a  measure  of  conservative  style  by  the  opponent 
the  expected  value  of  the  opponent’s  hand 
a  measure  of  the  probability  that  the  opponent  is  bluffing 
a  measure  of  conservative  style  by  the  opponent 
a  measure  of  the  probability  that  the  opponent  can  be  bluffed 
the  largest  bet  possible  without  causing  the  opponent  to  drop 
a  small  bet 
a  medium  size  bet 

a  large  bet  made  in  an  attempt  to  bluff  the  opponent 
a  small  bluff  bet 
a  large  bluff  bet 

the  average  bet  made  during  a  round  of  play 

the  number  of  bets  made  by  the  opponent  during  a.  round  of  play 

the  number  of  times  the  opponent  was  caught  bluffing 

a  measure  of  the  correlation  between  the  opponent’s  hands  and  bets 

the  number  of  times  the  opponent  has  dropped 

a  sure-to-win  hand 

an  excellent-chance-of-winning  hand 

a  good-chance-of-winning  hand 

a  poor-chance-of-winning  hand 

a  no-chance-of-winning  hand 

constants 


1. 

2. 


3. 


a. 

b. 

a. 

b. 

c. 

d. 

a. 

b. 

c. 

d. 


(SWP8B5  ***♦) 
(SW  ******) 
(ECP1B5  *  *  *  *) 


(EC  ******) 
(GCP2B5**ORl  *) 


(*  POT-f-(2XLASTBET)  O  *  *  *  *) 
^  (*  POT+(2XLASTBET)  LAP  *  ♦  * 
^  (*  POT+(2X LASTBET)  O  *  *  *  *) 
PI  -^P,P  >  K1 
B5->B,B  >  O 

^  (*  POT-i-(2XLASTBET)  LAP  *  *  * 
->  (*  POT-}-(2XLASTBET)  O  *  *  *  *) 
P2  P,  P  >  K2 
ORl  -^R,R  =  Oorl 


*) 


*) 


(GCP9B6**ORl  *) 


(*  POT+(2XLASTBET) O  *  *  *  *) 


e. 

f. 
g- 

h. 

i. 


P9-^P,P  >  15 
B6-^B,B>7 

(GC  *  B5  *  *  OR2  CSl)  ^  (*  POT-f  (2XLASTBET)  O  *  *  *  *) 
OR2  R,  R  -  2 
CSl  ->OCS,OCS  >  K3 

(GC  P3  B5  *  *  OR3  *)  ^  (*  POT-{-(2X LASTBET)  O  *  *  *  *) 


call 

bet 

call 

bf 

bf 

bet 

call 

bf 

bf 

call 

bf 

bf 

call 

bf 

bf 

call 
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TABLE  4-1  (continued) 


4. 


6. 


k. 

l. 

m. 

n. 

o. 
P* 

q. 

r. 

s. 

a. 

b. 

c. 

d. 

e. 

f. 
8- 

h. 

i. 

j. 

k. 

l. 

m. 

n. 

o. 
P- 

q- 

r. 

s. 

a. 

b. 

c. 

d. 

e. 

f. 

g- 

h. 

i. 

j. 

k. 

l. 

m. 

n. 

o. 
P- 

q- 

r. 


(GC**B01*0R3) 

(GCP4B5****) 

(GCP9B7****) 


P3 

OR3 

BOl 

P4- 

B7- 


(GC  *♦*♦**) 
(PC*B5*PB2  OR4*) 

PB2 

OR4 

(PC*B5*PB2  OR2  CS2)  - 
CS2- 

(PCP6B9B01 PB3  OR6  *)  - 
P6- 
B9  - 
PB3- 
OR6- 

(PCP5B2B02*  *  *) 

P5  - 
B2- 
B02  - 

(PC*B8*PB40R6*) 

B8- 

PB4- 

(PC  *  B5  *  *  *  *) 

(PC  ♦*♦***) 

(NC  *  *  *  *  OR4  *) 
(NC****OR2CS3) 

CS3- 

(NCP10B9BO1  *OR7*)  - 
PIO- 
OR7- 

(NCP6B4B03*0R6*)  - 

P6- 
B4- 
B03- 


(NC*B5PB1  ♦*) 

(NCP7B9****) 

(NCP7B3****) 

(NCP6B3**OR6  *) 
(NC*****») 


PBl  - 
P7- 
B3- 

SW- 


^P,P  >  K4 

bf 

R,R  =  -1 

bf 

(*POT+(2XLASTBET)SB  *  *  *  ♦) 

bet 

BFO,BFO>K5 

bf 

(*  P0T+(2XLASTBET)0  *  *  *  *) 

call 

P,P>K6 

bf 

(*P0T+(2XLASTBET)0  *  *  *  *) 

call 

B,B  >  10 

bf 

(*POT+(2XLASTBET)  MB  ♦  *  *  *) 

bet 

(*  POT-f  (2XLASTBET)0  *  *  *  *) 

call 

PB,  PB  >  1 

bf 

R,R  =  O 

bf 

(*P0T+(2XLASTBET)0  *  *  *  *) 

call 

OCS,  OCS  >  K7 

bf 

(*  POT+(2XLASTBET)BB  *  *  *  *) 

bet 

P,P<K14 

bf 

B,B  <  5  A  B  ?*£  O 

bf 

PB,  PB  >  3 

bf 

R,R  ^  —1 

bf 

(*  POTH-(2xLASTBET)BB  *  *  *  *) 

bet 

P,P<K9 

bf 

B,B<K10 

bf 

bfo,bfo>kii 

bf 

(O  *  o  *  *  *  *) 

drop 

B,B>9 

bf 

PB,  PB  <  2 

bf 

(*  POT+(2XLASTBET)  Q****) 

call 

i*  POT+(2XLASTBET)  SB  *  *  *  *) 

bet 

(0*0=^***) 

drop 

(0*0=^***) 

drop 

OCS,  OCS  >  K12 

bf 

(*POT+(2XLASTBET)BBS  *  *  *  *) 

bet 

P,P  <  13 

bf 

R,R-3 

bf 

(*POT+(2XLASTBET)BBL  *  *  *  *) 

bet 

P,  P  <  K14 

bf 

B,B  <  K15 

bf 

BF0,  BF0  >  K16 

bf 

(*P0T+(2XLASTBET)0  *  *  *  *) 

call 

PB,PB  >  K17 

bf 

(*  P0T4-(2XLASTBET)0  *  ♦  *  *) 

call 

P,P<K32 

bf 

(*POT+(2XLASTBET)SB  *  ♦  *  *) 

bet 

B,B<K13 

bf 

(*POT+(2XLASTBET)SB  *  *  *  *) 

bet 

(0*0****) 

drop 

H,  H  -  OH  >  K18  and  H  >  K19 

bf 
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TABLE  4-1  {continued) 

7; 

EC  -^H,H  -  OH  >  KlSandH  <  K19 

bf 

8. 

GC  ->  H,  K20  <  H  -  OH  <  K18 

bf 

9. 

PC  H,  K21  <  H  -  OH  <  K20 

bf 

10. 

NC->H,H-OH<K21 

bf 

11. 

OH  K22  -  (K23  X  OAVGBET  X  OTBET  X  OB) 

ff 

12. 

OB  ^  (K24  X  OBLUFFS)  -  (K25  X  CS) 

ff 

13. 

CS  -►  (K26  X  OCORREL)  +  (K27  X  OD) 

ff 

14. 

BO  (K28  X  CS)  -  (K29  X  OH) 

ff 

15. 

LAP  K30  -  (K31  X  BO) 

ff 

16. 

SB  — >  randoni(l,5) 

ff 

17. 

MB  — >  random(3,9) 

ff 

18. 

BBS  — >  random(10,15) 

ff 

^  19. 

BB  random(8,14) 

ff 

20. 

BBL  — >  randoni(14,20) 

ff 

21. 

H  ^  VDHAND,  VDHAND  >  0 

bf 

22. 

P->POT,POT  >  -1 

bf 

23. 

B  -►  LASTBET,  O  <  LASTBET  <21 

bf 

24. 

BFO  -» BLUFFO,  BLUFFO  <  O  V  BLUFFO  >  O 

bf 

25. 

PB  POTBET,  POTBET  >  O 

bf 

26. 

R->ORP, -1  <ORP<4 

bf 

^  27. 

OCS  -*  OSTYLE,  OSTYLE  <  O  V  OSTYLE  >  O 

bf 

Values  of  Constants  K1  through  K32 


The  values  of  the  constants  used  in  defining  the  production  rules  representing  the  heuris¬ 
tics  for  Draw  Poker  are  given  below. 


K1  =  40 

K17  =  4 

K2  -  22 

K18  -  27 

K3  =  1 

K19  -  376 

K4  -  9 

K20  =  10 

K5  =  5 

K21  -  0 

K6  =  30 

K22  =  6 

K7  =  1 

K23  =  .05 

K8  =  6 

K24  =  1 

K9  =  23 

K25  =  2 

KIO  -  7 

K26  =  1 

Kil  -  10 

K27  =  2 

K12  =  1 

K28  =  8 

K13  -  1 

K29  =  1 

K14  =  21 

K30  =  5 

K15  =  4 

K31  =  1 

K16  =  20 

K32  =  8 

Source:  From  Waterman  (1968).  Reprinted  with  permission. 
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achieves  the  level  of  human  experts  in  partnership  bidding  and  is  esti¬ 
mated  to  be  slightly  more  skillful  at  competitive  bidding  than  is  the 
average  duplicate  Bridge  player.  The  program  is  capable  of  bidding 
skillfully  according  to  four  systems:  Standard  American,  Goren, 
Schenken,  and  Kaplan-Schweinwold  (an  ability  few  humans  possess). 
Figure  4-12  shows  Wasserman’s  program  bidding  all  four  hands  (in¬ 
dependently)  of  a  random  deal  of  the  cards. 

In  March  1969,  the  program’s  competitive  bidding  ability  was 
tested  against  two  human  players  who  had  often  played  as  partners, 
one  a  Life  Master  having  approximately  1,000  points,  the  other  pos¬ 
sessing  nearly  100  points.  The  contest  was  conducted  in  two  sessions, 
with  15  hands  being  bid  in  each  session.  (Hands  and  scoring  informa¬ 
tion  were  obtained  from  the  American  Contract  Bridge  League  National 
Tournament,  held  at  Cleveland  in  March  1969.)  The  program  won  one 
session  and  lost  the  other,  being  defeated  overall  by  a  score  of  388.50  to 
361.50. 

Wasserman’s  program  is  similar  to  Greenblatt’s  Chess  Player,  and 
Samuel’s  Checkers  Player,  in  that  it  is  designed  to  evaluate  Bridge 
hands,  using  features  and  procedures  similar  to  those  described  by  good 
human  Bridge  players.  Unlike  Samuel’s  program,  the  Wasserman 
Bridge  bidder  does  not  “learn”  to  improve  its  performance.  Even  so, 
Wasserman’s  program  is  significant  because  it  does  perform  a  difficult 
intellectual  task. 


GENERAL  GAME>PLAYING  PROGRAMS 

Ultimately,  the  most  desirable  game-playing  program  would  be 
one  that  could  accept  the  definition  of  any  game  of  strategy  and  which, 
with  practice,  could  learn  to  play  the  game  with  a  skill  comparable  or 
greater  than  that  which  people  could  develop  in  playing  the  game.  At 
the  monient,  the  attainment  of  a  general  game-playing  program  is  an 
indefinite  prospect.  However,  programs  have  been  written  that  are 
general  with  respect  to  certain  specific  classes  of  games.  In  this  section  a 
brief  description  is  given  of  the  classes  of  games  that  have  been  in¬ 
vestigated  and  the  programs  that  are  capable  of  playing  them. 

The  first  class  of  games  are  the  positional  games.  These  include 
two-,  three-,  and  ^-dimensional  Tic-tac-toe,  Hex,  Go-Moku  (not  to  be 
confused  with  go),  the  Shannon  switching  games  (e.g.,  Bridg-it),  and 
many  others.  Essentially,  a  positional  game  is  defined  by  three  sets,  say, 
N,  A,  and  B.  The  set  N  is  considered  to  be  a  set  of  positions;  A  and  B 
each  contain  subsets  of  N.  A  positional  game  is  played  by  two  players, 
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NORTH 


s- 

A 

Q 

6 

4 

H  - 

10 

9 

6 

D- 

5 

3 

C- 

Q 

7 

6 

5 

WEST 

EAST 

S- 

J 

9 

7 

S- 

3 

H- 

A 

Q 

8  5  2 

H- 

3 

D- 

10 

6 

2 

D- 

A  K 

Q 

J  8  7 

C- 

A 

8 

C- 

J  10 

9 

3 

SOUTH 

S- 

K 

10 

8 

CM 

H- 

K 

J 

7 

4 

D- 

9 

C- 

K 

4 

2 

SOUTH 

WEST 

NORTH 

EAST 

PASS 

PASS 

1  D 

DOUBLE 

REDOUBLE 

1  S 

2  D 

2  S 

3  H 

PASS 

4  D 

PASS 

5  D 

DOUBLE 

PASS 

PASS 

PASS 

Figure  4-12.  A  complicated  and  highly  competitive  bidding  sequence. 
(Wasserman,  1970),  reprinted  with  permission.) 
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who  alternate  in  choosing  elements  from  N  (once  chosen,  an  element 
may  not  be  rechosen).  The  first  player  tries  to  construct  one  of  the  sets 
belonging  to  A,  and  the  second  player  tries  to  construct  one  of  the  sets 
belonging  to  B,  The  winning  player  is  the  one  who  first  succeeds  in 
constructing  one  of  the  desired  sets.  Positional  games  may  involve 
elements  of  aggressive  strategy,  since  one  player  may  choose  an  ele¬ 
ment  from  N  that  he  knows  the  other  player  would  like  to  choose. 


1 

2 

3 

4 

5 

6 

7 

8 

9 

To  illustrate,  the  positions  in  two-dimensional  Tic-tac-toe  may  be 
numbered  as  shown  by  the  sketch.  The  set  N  for  Tic-tac-toe  may  thus 
be  considered  equal  to  {1,2, ...  ,9},  while  the  set  A  and  the  set  B  both 
contain  the  sets 

{1,2,3},  (4,5,6),  (7,8,9),  (1,4,7),  (2,5,8),  (3,6,9),  (1,5,9), 
and  (7,5,3) 

A  player  in  the  game  usually  indicates  that  he  has  chosen  a  position  by 
placing  his  “mark”  (which  is  either  an  X  or  an  O)  on  the  position. 

Positional  games  were  formalized  by  Koffman  (1967)  and  have 
been  studied  by  many  researchers,  including  Banerji  (1970),  Citren- 
baum,  Pitrat  (1971),  and  Banerji  and  Ernst  (1971).  Programs  have 
been  constructed  which  are  capable  of  accepting  the  definition  of  an 
arbitrary  positional  game  and,  with  practice,  of  “learning”  to  play  the 
game  quite  well.  Koffman  constructed  a  program  that  learns  to  recognize 
sets  of  important  board  configurations  in  4x4x4  Tic-tac-toe,  and 
which  requires  about  12  games  before  it  starts  beating  its  opponents. 
Koffman’s  program  describes  a  given  set  of  board  configurations  by 
means  of  a  weighed  graph.  Fig.  4-13  shows  a  situation  in  4x4x4 
Tic-tac-toe  from  which  player  Z  can  force  a  win  in  six  moves;  Fig.  4-1 3b 
shows  the  sequence  of  moves  that  leads  to  the  win;  Fig.  13c  shows  the 
“winning  paths”  used  in  the  force  and  their  interconnections;  and 
Fig.  13d  shows  the  weight-graph  representation  for  the  situation.  Figure 
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A.  Winning  situation 
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B.  Sequence  of  moves  which 
forces  a  win  from  A 
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C.  Analysis  of  B  in  terms 
of  principal  rows,  columns, 
and  diagonals  it  uses 


Figure  4'-13.  A  winning  situation  in  4  x  4  x  4  Tic-tac-toe  and  its  graphic 
representation.  (Koffman,  1967,  reprinted  with  permission.) 

4-14  shows  some  other  winning  positions  that  have  the  same  weighted 
graph  representation. 

Another  general  class  of  game  that  has  received  a  great  deal  of 
study  is  the  nimlike  game,  formalized  by  Berge  in  1962.  A  given  nimlike 
game  consists  of  a  directed  graph  and  a  counter,  initially  placed  on  one 
of  the  nodes  of  the  graph.  The  graph  of  a  nimlike  game  is  required  to 
have  terminal  nodes  and  it  may  not  have  “loops.”  Two  players  alternate 
in  moving  the  counter  from  its  position  to  an  adjacent  node  along  a 
directed  arc.  The  first  player  to  reach  a  terminal  node  wins. 

Nimlike  games  have  been  studied  by  Berge,  Banerji  and  Ernst 
(1971).  Many  techniques  for  decomposing  a  given  nimlike  game  into 
smaller  games  (see  “problem  reduction”  in  Chapter  3)  or  for  proving 
that  the  strategies  of  one  game  can  be  used  for  another,  have  been  de- 
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Figure  4~14.  Some  other  winning  situations  in  4  x  4  x  4  Tic-tac-toe  with 
the  same  graph  representation.  (Koffman,  1970,  reprinted  with  per¬ 
mission.) 

veloped.  However,  the  description  of  these  techniques  involves  a  con¬ 
siderable  amount  of  mathematics,  and  therefore  will  not  be  presented 
here. 

The  development  of  general  game-playing  programs  is  hampered 
by  the  fact  that  there  is  as  yet  no  clearly  satisfactory  theory  of  what  it 
means  for  two  games  to  be  “strategically  isomorphic,”  or  of  how  to 
find  simpler  games  that  are  strategically  isomorphic  to  a  more  difficult 
one.  It  seems  likely  that  graphlike  structures  will  turn  out  to  be  a  good 
means  for  describing  classes  of  important  game  situations  in  other  games 
as  well  as  in  positional  ones.  It  also  seems  likely  that  pattern  recogni¬ 
tion  and  (perhaps)  semantic  information-processing  techniques  will 
eventually  be  very  valuable  to  the  construction  of  general  game-playing 
programs. 

NOTES 

4^1,  Whether  computers  can  have  “choice”  is  a  debatable  question,  but 
for  us  it  is  largely  irrelevant.  One  might  quibble  with  the  ability  of  com¬ 
puters  to  “play”  games,  on  the  grounds  that  their  ability  to  “choose  among 
alternatives”  has  not  been  proved,  and  that  this  ability  is  (at  least  on  the  sur¬ 
face)  required  in  the  von  Neumann-Morgenstern  formalization  of  the  theory 
of  games.  To  answer  this  quibble,  we  simply  note  that  we  are  really  con¬ 
cerned  with  the  ability  of  computers  to  simulate  playing  games,  not  with 
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whether  they  “really”  make  choices,  etc.  If  the  reader  wishes  to  pursue  the 
quibble  on  its  own  terms,  three  facts  are  offered:  (1)  Computers  can  reason 
“causally,”  that  is,  take  into  (perhaps  only  partial)  account  the  conse¬ 
quences  of  various  actions;  (2)  a  computer’s  decision  can  be  based  (per¬ 
haps  only  partially)  on  a  “random”  element;  (3)  a  computer  program  can 
be  “self -affecting”  (see  Chapter  8).  Each  of  these  facts  serves  either  to 
diminish  our  ability  to  say  that  the  computer’s  operation  is  necessarily  pre¬ 
determined  or  to  increase  our  ability  to  say  that  the  computer  can  have  a 
“sense  of  purpose,”  that  its  actions  can  be  “purposeful.”  When  we  combine 
fact  (1)  with  fact  (3),  we  come  to  the  conclusion  that  a  computer  can 
change  the  way  it  reasons  causally  about  a  problem  and,  in  a  sense,  display 
“free  will.”  Whether  its  “will”  is  really  as  “free”  as  ours  may  seem  to  de¬ 
pend  upon  its  abilities  to  sense  and  act  upon  the  “real  world”;  still,  we  may 
note  that  in  certain  limited  realms  of  commonly  shared  sensation  and  action 
(such  as  games),  the  computer’s  “freedom  of  will”  may  roam  more  widely, 
and  more  successfully,  than  our  own.  Thus,  Koff man’s  computer  program 
(discussed  in  the  last  section  of  this  chapter)  could  develop  its  own  strat¬ 
egies  for  the  game  of  4x4x4  Tic-tac-toe  and  within  1 2  games  “learn”  to  start 
beating  its  human  opponents. 

4-2.  How  well  can  people  play  games?  This  is  a  very  devious  question: 
Actually  the  significant  thing  seems  to  be  that  people  can  improve  their 
ability  to  play  a  game.  Are  there  limits?  For  example,  how  close  are  the 
current  Chess  Grandmasters  to  playing  their  game  with  the  optimum  strat¬ 
egy?  We  'know  that  optimum  strategy  must  exist,  but  the  game-theoretic 
procedure  for  determining  what  it  is  lies  beyond  the  bounds  of  computa¬ 
tional  ability.  Thus,  we  don’t  really  know  what  the  optimum  strategy  is  un¬ 
less  we  can  find  some  better  way  to  compute  it.  At  the  moment,  all  we  can 
do  is  look  at  people  who  play  Chess  better  than  average  players,  and  even 
their  performance  tells  us  little  about  how  well  the  game  might  be  played 
in  theory. 

From  a  theoretical  standpoint,  there  may  be  limits  to  how  well  a  game 
can  be  played  by  machines.  It  may  be  possible  to  prove  that  there  are  games 
which  cannot  be  played  “perfectly,”  in  a  practical  sense.  Although  such 
games  would  be  finite,  their  optimum  strategies  would  be  beyond  the  bounds 
of  computational  procedure  (using  the  game-theoretic,  enumerative  pro¬ 
cedure),  and  all  games  that  were  “strategically  isomorphic”  to  them  wduld 
be  of  at  least  the  same  size.  (See  the  last  section  of  this  chapter.)  Assuming 
they  could  be  shown  to  exist,  we  might  call  these  games  “grin-and-bear-it” 
games.  Some  interesting  questions  would,  of  course,  be:  Are  there  grin-and- 
bear-it  games  that  have  finite  descriptions  (rules  and  payment  functions) 
small  enough  so  that  they  can  be  played  by  humans  and  computers?  What 
are  some  grin-and-bear-it  games? 

4-3.  Kriegsspiel  is  played  with  two  players  and  an  “umpire.”  Each  player 
has  his  own  chessboard,  which  cannot  be  seen  by  the  other  player,  but  the 
umpire  can  see  both  chessboards.  As  in  Chess,  the  players  choose  oppositely 
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colored  pieces  and  alternate  in  making  moves.  Each  player’s  board  is  empty 
except  for  his  own  pieces,  which  are  initially  arranged  in  the  standard  for¬ 
mation.  Generally,  neither  player  knows  exactly  what  moves  the  other 
player  has  made.  Instead,  when  one  player  (say,  ^4)  makes  a  move,  he 
makes  a  sequence  of  choices.  After  each  choice,  the  umpire  informs  him 
whether  or  not  the  choice  is  “legal”  (i.e.,  consistent)  according  to  the  rules 
of  ordinary  Chess,  with  the  ^  moves  so  far  made  by  both  players.  If  the 
choice  is  not  legal,  then  it  has  no  effect  upon  the  boards  of  either  player.  If 
the  choice  is  legal,  then  the  configuration  of  pieces  on  A’s  board  is  trans¬ 
formed  accordingly  and  it  becomes  the  other  player’s  turn  to  move.  Neither 
player  hears  the  choices  that  are  made  by  the  other  player. 

4-4.  Such  a  game  might  still  have  aspects  of  strategy  and  problem  solving: 
Suppose  the  payment  function  specifies  that  payments  shall  be  received  only 
when  the  game  reaches  a  terminal  node,  that  is,  at  the  end  of  a  play.  Sup¬ 
pose  that  for  different  plays  of  the  game  the  payment  function  specifies  a 
different  “total  payment,”  and  suppose  that  the  payment  function  has  a 
maximum:  that  is,  there  is  a  possible  play  for  which  the  “total  payment”  is 
greater  than  or  equal  to  that  for  any  other  possible  play.  Finally,  suppose 
that  for  any  play  of  the  game  the  payment  function  specifies  that  the  “total 
payment”  is  to  be  divided  equally  among  all  players.  We  then  have  a 
“strictly  noncompetitive”  game  in  which  each  player  has  the  problem  of 
cooperating  with  the  other  players  so  as  to  bring  about  a  play  that  yields 
the  maximum  total  payment.  One  can  design  strictly  noncompetitive  games 
that  are  very  difficult  to  play. 

4-5,  Alternatively,  one  can  view  a  game  as  a  problem  in  which  the  solu¬ 
tion  is  a  tree,  rather  than  a  sequence,  of  operators.  Usually,  such  a  repre¬ 
sentation  of  the  complete  strategy  for  playing  a  game  cannot  be  stored 
explicitly  in  a  computer,  but  must  instead  be  stored  implicitly,  as  a  pro- 
cedure  for  finding  the  operator  to  apply  in  a  given  situation.  The  reader 
who  is  familiar  with  the  procedural  epistemology  of  Hewitt  (1968  et  seq.) 
may  anticipate  with  the  present  author  the  desirability  of  writing  some 
game-playing  programs  in  languages  of  the  planner  genus  (see  Chapters 
6  and  7). 

4-6,  Playing  “perfectly”  in  this  sense  is  really  playing  cautiously,  and  is 
equivalent  to  making  the  assumption  that  one’s  opponent (s)  also  have  in¬ 
finite  time  and  resources.  In  fact,  if  one  has  extra  knowledge  about  one’s 
opponent,  not  specified  in  the  rules  of  the  game,  it  may  well  be  possible  to 
play  “better  than  perfectly.”  Thus,  in  reality,  a  player  may  intentionally 
choose  an  alternative  that  he  knows  to  have  a  poor  theoretical  value — if 
he  thinks  that  his  opponent  will  not  see  how  to  exploit  his  “mistake”  and 
will  instead  fall  into  a  trap.  As  one  might  expect,  neither  classical  game 
theory  nor  the  field  of  game-playing  programs  currently  being  developed 
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by  Ai  research  has  very  much  to  say  about  “opponent-oriented”  strategies. 
The  ability  to  develop  such  strategies  is  clearly  possessed  by  intelligent 
human  game-players,  and  thus  we  would  expect  that  ai  research  might 
eventually  program  computers  to  simulate  it.  However,  it  may  be  a  long 
time  before  this  can  happen,  since  the  human  development  of  an  opponent- 
oriented  strategy  often  makes  use  of  knowledge  about  the  opponent  which 
is  not  limited  strictly  to  his  past  performance  at  the  game.  The  development 
of  a  good  opponent-oriented  strategy  would  require  that  the  computer  be 
able  to  make  a  “model”  of  its  opponent’s  game-playing  abilities  and  goals, 
but  computers  currently  do  not  have  the  ability  to  gather,  represent,  or  use 
the  information  necessary  to  make  models  that  would  be  sufficiently  ac¬ 
curate.  For  the  reader  who  is  interested  in  pursuing  this  subject,  Samuel 
(1967)  mentions  the  desirability  of  programming  game-playing  computers 
to  formulate  “deep  objectives”  as  part  of  their  strategies,  and  to  hypothesize 
on  their  opponent’s  deep  objectives.  Colby  and  Tesler  (1969),  Colby  and 
Smith  (1969),  and  Abelson  and  Carrol  (1965)  discussed  the  ability  of  com¬ 
puters  to  simulate  human  “belief  systems”  (though  not  in  the  context  of 
game  playing);  Clarkson  (1963)  presented  an  early  program  that  could 
model  human  decisions  about  stock  purchasing.  (There  are  probably  other 
relevant  papers  in  the  field  of  “simulation  of  cognitive  processes”  of  which 
the  present  author  is  not  aware.)  Also,  von  Neumann  and  Morgenstern 
(1944)  treated  the  subject  of  “bluffing”  in  Poker,  although  not  from  a 
“model  making”  standpoint. 

4-7.  The  value  of  the  alpha-beta  technique  is  indicated  by  the  fact  that 
its  use  in  programs  which  play  the  game  of  Kalah  has  evidently  removed 
this  game  from  the  sphere  of  human  dominance;  that  is,  the  Kalah-playing 
programs  are  probably  unbeatable  by  humans,  even  though  the  optimum 
strategy  for  the  game  is  beyond  the  bounds  of  computational  ability  (Kalah 
is,  however,  less  difficult  than  Checkers).  For  further  information  on  Kalah, 
see  Russeil  (1964). 


EXERCISES 

4-1.  Estimate  whether  the  complete  generation  and  minimax  evaluation  of  the 
game  trees  for  Chess  and  go  can  be  performed  by  (a)  a  “conventional”  machine; 
(b)  an  “attainable”  machine;  (c)  a  “theoretical  serial”  machine;  (d)  a  “theoretical 
parallel”  machine  (see  Chapter  2,  “Limits  to  Computability.”)  (e)  Make  the  cor¬ 
responding  estimates  as  to  whether  these  machines  could  carry  out  a  dynamic 
search  of  the  complete  “reasonable  game  trees”  of  these  games  (see  the  section 
“Checkers”  in  this  chapter). 

4-2.  Investigate  whether  it  is  epistemologically  adequate  to  describe  real-world 
phenomena  as  the  plays  of  a  partially  specified  game,  for  which  it  is  necessary 
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to  infer  some  of  the  rules.  Is  such  a  description  metaphysically  adequate?  (See 
Chapter  3.) 

4-3,  (a)  Show  how  White  can  move  to  gain  at  least  a  draw. 


(b)  What  subproblems  did  you  consider  in  finding  a  solution  to  (a)?  (c)  Discuss 
how  a  computer  might  be  programmed  to  solve  Chess  end-game  problems. 

4-^.  {Poker  Coins.)  *  (a)  Find  the  optimal  strategy  for  the  game  of  Poker 
Coins,  the  rules  of  which  are: 

(1)  A  player  throws  N  coins;  he  then  puts  one  or  more  aside  and  rethrows 
the  rest. 

(2)  This  throwing  is  repeated  until  he  no  longer  has  any  coins  to  throw  (i.e., 
all  the  coins  have  been  put  aside). 

(3)  Each  of  the  other  players  takes  a  turn  at  throwing  N  coins,  according  to 
rules  1  and  2;  the  winners  are  those  players  with  the  maximum  number 
of  heads. 

(b)  Analyze  Poker  Dice,  which  is  played  according  to  the  same  rules  except  that 
N  dice  are  thrown  and  those  players  with  the  highest  score  are  the  winners. 

4-5.*  (a)  Analyze  Giveaway  Chess,  played  as  follows: 

p )  Captures  must  be  made,  although  a  player  may  choose  which  capture  to 
make,  if  more  than  one  is  available. 

(2)  Pawns  must  be  promoted  to  queens  if  they  reach  the  eighth  row. 

(3)  The  kings  obey  the  same  rules  of  moving  and  capturing  as  in  ordinary 
chess,  but  there  is  no  such  thing  as  “mate,”  and  neither  player  loses  if  his 
king  is  captured. 

*  From  Beeler,  Gosper,  and  Schroeppel  (1972).  Reprinted  with  permission. 
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(4)  The  first  player  to  lose  all  of  his  pieces  wins, 

(b)  Analyze  Escalation  Chess,  where  white  gets  1  move,  black  2,  white  3,  etc.  If 
a  player  is  in  check,  he  must  get  out  of  check  on  his  first  move.  A  player  may  not 
move  into  check  or  take  his  opponent’s  king,  but  he  can  place  his  opponent  in  a 
“multiple  check,”  etc.  A  player  is  checkmated  if  he  can’t  get  his  king  out  of  check 
on  his  first  move. 


Wordy  Eye.  (Reprinted  with  permission  from  the  computer  artwork  of 
M.  R.  Schroeder.  Copyright  ©  Bel!  Laboratories,  1973.)  See  Example 

5-5. 
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INTRODUCTION 

This  chapter  discusses  ways  that  machines  can  simulate  “pattern 
perception.”  Roughly  speaking,  pattern  perception  is  the  ability  to  find 
a  simple,  useful  description  for  something,  given  an  initial  description 
that  is  very  complex,  or  of  low  utility.  In  order  to  find  the  simple  de¬ 
scription,  one  might  make  use  of  some  property  (“form,”  “design,”  or 
“regularity”)  that  is  possessed  by  the  more  complex  description.  If  there 
is  such  a  property,  then  the  complex  description  is  said  to  be  an  example 
of  a  “pattern.”  Pattern  perception  may  operate  on  descriptions  of  either 
physical  or  abstract  things.  Thus,  it  is  common  to  talk  of  “visual  pat¬ 
terns,”  “sound  patterns,’’  “symbol  patterns,”  and,  even,  “reasoning  pat¬ 
terns.”  Not  all  of  these  have  been  explicitly  investigated  by  Ai  research. 
However,  it  should  be  clear  that  a  machine  which  can  solve  problems 
in  a  real-world  environment  must  be  able  to  make  and  use  descriptions 
of  that  environment.  Machines  can  make  some  descriptions  rather 
easily  (e.g.,  photographs),  but  they  have  difficulty  in  using  them  to 
“understand”  (recognize  and  solve  problems  involving)  what  is  being 
described.  From  a  practical  standpoint,  the  extent  to  which  machines 
are  able  to  perceive  patterns  is  a  limit  to  the  extent  that  they  can  solve 
real-world  problems. 

This  chapter  concentrates  on  the  use  of  machines  to  do  “visual” 
pattern  perception,  or  scene  analysis,  both  because  this  is  the  area  in 
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which  the  largest  amount  of  work  has  been  done  to  date,  and  because 
there  are  good  grounds  for  believing  visual  pattern  perception  to  be 
one  of  the  current,  major  problems  confronting  ai  research.  However, 
other  types  of  pattern  perception  will  be  discussed  in  the  next  two  sec¬ 
tions  and  in  the  last  section  of  this  chapter.  A  wide  variety  of  approaches 
have  been  followed  toward  visual  pattern  perception  by  machines.  An 
attempt  will  be  made  to  summarize  some  of  the  most  important  ap¬ 
proaches  and  indicate  the  ways  in  which  each  approach  is  related  to 
the  others.  However,  there  is  not  space  in  this  chapter  for  a  complete 
survey  of  the  subject.  For  a  more  complete  summary  of  vision  systems, 
refer  to  the  book  by  Duda  and  Hart  (1973),  and  the  survey  papers  by 
Rosenfeld  (1972)  and  Turner  (1971). 


SOME  BASIC  DEFINITIONS  AND  EXAMPLES 

AI  researchers  have  adopted  a  set  of  basic  definitions  for  the  word 
“pattern”  which  are  fairly  consistent  with  the  definitions  used  by  re¬ 
searchers  in  other  fields  (e.g.,  “numerical  taxonomy,”  “behavioristic 
psychology,”  “theoretical  linguistics”).  The  definitions  are  not  very 
hard  to  understand.  However,  since  the  word  “pattern”  is  usually  not 
defined  in  everyday  conversation,  this  section  is  devoted  t6  an  explica¬ 
tion  of  its  use  in  ai  research  and  a  discussion  of  some  general  problems 
involving  “patterns”  that  have  been  considered  by  ai  researchers. 

A  pattern  is  a  collection  of  objects,  each  of  which  has  the  property 
that  it  satisfies  a  certain  criterion,  known  as  the  pattern  rule  for  the 
pattern.  The  objects  in  a  pattern  are  said  to  be  pattern  examples.  (Re¬ 
search  papers  sometimes  confuse  these  ideas,  using  the  word  “pattern” 
to  denote  what  we  have  chosen  to  call  pattern  rules  and  pattern  ex¬ 
amples.)  Artificial  intelligence  research  has  been  concerned  with  sev¬ 
eral  basic  problems  involving  patterns,  pattern  rules,  and  pattern  ex¬ 
amples. 

1.  {Classification)  Given  an  object  and  a  collection  of  pattern 
rules,  determine  which  pattern  rules  are  satisfied  by  the 
object. 

2.  {Matching)  Given  a  pattern  rule  and  a  collection  of  objects, 
find  those  objects  which  satisfy  the  pattern  rule. 

3.  {Description,  or  Articulation)  Given  an  object,  find  a  de¬ 
scription  for  it  in  terms  of  pattern  rules  that  are  satisfied 
by  the  parts  of  the  object,  or  by  the  object  itself. 

4.  {Learning)  Given  a  collection  of  objects,  some  of  which  do 
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and  some  of  which  do  not  belong  to  a  given  pattern,  deter¬ 
mine  a  pattern  rule  for  those  that  do  belong  to  the  given 
pattern. 

Each  of  these  problems  may  occur  in  a  way  which  involves  the 
others  as  subproblems.  In  addition,  there  are  important  problems  of 
representation,  which  involve  finding  languages  with  which  to  state  pat¬ 
tern  rules. 


EXAMPLE  5-1.  “sunflower”  PATTERNS.  This  example  was  used 
in  Chapter  2  for  a  brief  discussion  on  the  nature  of  mathematical 
descriptions.  Figure  5-1  shows  an  example  of  a  sunflower  pat¬ 
tern.  This  pattern  example  of  a  sunflower  pattern  is  a  collection 
of  dots  in  the  plane.  For  simplicity’s  sake,  each  dot  is  con¬ 
sidered  to  be  simply  a  “point.”  A  dot  can  be  described  by  giv¬ 
ing  its  position  relative  to  some  pair  of  fixed  reference  points  in 
the  plane,  one  to  serve  as  the  origin  and  the  other  to  establish 
a  scale  and  a  baseline  for  angular  measurements.  Thus, 
r“ll.l,^  =  2isa  (polar  coordinate)  description  of  a  dot.  We 
say  that  a  dot  belongs  to  a  sunflower  pattern  example  if  and 
only  if  it  satisfies  the  pattern  rule  for  the  sunflower  pattern.  This 
pattern  may  be  described  either  by  presenting  some  of  its  pat¬ 
tern  examples  (we  presented  one  in  Fig.  5^1)  or  by  stating  a 
pattern  rule  for  it.  An  En^ish  statement  of  a  pattern  rule  for  the 
sunflower  pattern  example  shown  in  Fig.  5-1  is:  A  collection 
of  dots  is  an  example  of  the  sunflower  pattern  if  and  only  if 
each  dot  is  the  intersection  of  2  of  the  24  Archimedean  spirals 
that  have  equations  obtained  by  substituting  for  k  any  value 
between  1  and  12,  inclusive,  and  by  substituting  for  i  either 
+ 1  or  —  1 ,  in  the  expression 


when  a  suitable  pair  of  reference  points  is  chosen.  An  infinite 
number  of  dots  can  belong  to  such  a  collection. 


EXAMPLE  5-2.  RECOGNIZING  PRINTED  CHARACTERS.  Much  early 
research  in  pattern  recognition  was  motivated  by  a  desire  to 
build  machines  (known  as  optical-character  recognizers,  or 
OCR’s)  that  would  be  capable  of  reading  alphabet  and  number 
characters,  either  written  or  printed  on  paper.  OCR’s  currently 
are  very  good  at  reading  certain  special  types  of  niachine- 
printed  characters,  rather  good  (about  80%  accurate)  at  recog- 
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Figure  5-1.  The  Archimedean  sunflower  pattern. 

nizing  typed  and  hand-printed  characters,  and  very  poor  at 
recognizing  handwritten  or  script  letters (e.g., 

When  we  say  a  machine  can  “read”  or  “recognize”  ^rtain 
characters,  we  are  essentially  talking  about  a  problem  of  pat¬ 
tern  classification.  For  example,  there  are  several  possible  ways 
of  writing  or  printing  the  letter  A.  Most  ways  produce  one  of 
several  possible  distributions  of  ink  on  a  paper  surface  that  an 
(English-literate)  human  will  be  capable  of  identifying  as  an 
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example  of  the  letter  A.  Thus,  the  letter  A  is  a  pattern;  each 
distribution  of  ink  on  paper  that  is  identified  by  people  as  being 
an  A  is  a  pattern  example  of  the  letter  A.  A  machine  recognizes 
the  letter  A  if,  whenever  it  is  presented  with  a  pattern  example 
of  A,  it  outputs  some  signal  corresponding  to  A  (it  may,  for 
example,  print  its  own  version  of  A),  and  if  it  never  outputs  that 
signal  when  it  is  not  presented  with  a  pattern  example  of  A/ 

Similarly,  we  can  define  what  it  means  for  a  machine  to  recognize 
other  characters  (b,  n,  f,  1,  2,  etc.).  If  a  machine  is  to  recognize  a 
character  or  pattern,  it  must  have  a  corresponding  pattern  rule  that  can 
be  applied  to  anything  which  is  presented  to  it,  to  test  whether  or  not 
the  thing  presented  is  a  pattern  example.  Some  ocr’s  are  given  the  pat¬ 
tern  rules  they  use  to  recognize  characters  and  others  are  designed  to 
develop  their  own  pattern  rules  (see  the  next  section).  In  each  case, 
when  presented  with  a  distribution  of  ink  on  paper,  the  ocR  is  required, 
in  effect,  to  classify  that  distribution  of  ink  as  being  a  pattern  example 
of  some  pattern.  (It  may  classify  it  as  being  ambiguous.)  For  further 
information  on  OCR’s,  see  Holt  (1968),  Munson  (1968),  and  Duda  and 
Hart  (1968). 

EXAMPLE  5-3.  SEQUENCE  PREDICTION.  Our  definition  of  “pat¬ 
tern”  makes  no  reference  to  time  or  sequentiality.  However,  it 
is  possible  in  our  formalism  to  talk  about  perception  of  se¬ 
quential  patterns.  For  example,  consider  the  problem  of  “se¬ 
quence  prediction.”  Initially,  one  is  presented  with  some  finite 
sequence  of  objects;  say,  numbers.  Thus,  one  might  be  shown 
the  sequence  a  =  0,1,1,2,3,5,8,13.  The  assumption  is  that  the 
sequence  will  continue;  one’s  problem  is  to  “predict”  how  it,  will 
continue.  In  other  words,  we  assume  that  o-  is  an  initial  portion  of 
some  unknown,  infinite  sequence  of  numbers.  Given  an  initial 
portion  of  the  infinite  sequence,  we  attempt  to  predict  the  re¬ 
mainder  of  the  sequence.  We  may  make  our  prediction  either  by 
presenting  some  finite  sequence  o-'  as  an  immediate  continuation 
of  o-  or  by  presenting  some  Turing  machine  T  so  that  T(o-)  will 
effectively  print  out  the  complete  continuation  of  a.  Thus,  for 
the  sequence  o-,  we  might  predict  an  immediate  continuation 
of  a  to  be  the  sequence  o-'  =  21,34.  Or  we  might  predict  a  com- 

1  Presenting  a  pattern  example  of  A  to  the  machine  usually  means  placing 
the  piece  of  paper  with  its  distribution  of  ink  in  an  appropriate  position  before  a 
television  camera  or  equivalent  scanner.  The  camera  will  make  an  “initial  de¬ 
scription”  of  the  piece  of  paper;  this  description  will  be  a  collection  of  electric 
signals  that  can  be  processed  by  the  machine. 
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plete  continuation  of  the  sequence  by  presenting  a  Turing 
machine  that  would  implement  the  rule:  “Given  an  initial  por¬ 
tion  0-,  generate  the  number  that  follows  the  last  element  of  o-  by 
adding  the  last  two  elements  of  o-  to  each  other.  Reset  <t  to  be 
the  old  initial  portion  followed  by  the  number  generated,  and 
begin  again.”  (Thus,  T  would  generate  21  by  adding  8  and  13, 
T  would  generate  34  by  adding  13  and  21,  etc.)  In  effect,  each 
initial  portion  of  the  sequence  to  be  predicted  can  be  considered 
as  a  pattern  example  of  that  sequence. 

It  is  rather  easy  to  see  how  a  Turing  machine  that  predicts  the 
complete  continuation  of  a  sequence  can  be  used  to  construct  a  pattern 
rule  (Turing  machine)  that  will  tell  us  what  sequences  are  pattern 
examples  (initial  portions)  of  the  infinite  sequence. 

However,  the  “problem  of  sequence  prediction”  is  complicated 
by  two  facts : 

1.  There  are  infinite  sequences  of  numbers  that  cannot  be  ef¬ 
fectively  enumerated  by  any  Turing  machine  (see  Chaitin, 
1966,1969). 

2.  Given  any  two  sequences  of  numbers,  say  a  and  o-',  it  is 
possible  to  find  a  Turing  machine  T  which  predicts  that  a 
will  be  the  immediate  continuation  of  o-. 

In  other  words,  there  exist  sequences  that  cannot  be  predicted  with 
complete  accuracy  by  any  Turing  machine,  and  it  is  theoretically  pos¬ 
sible  to  justify  any  finite  prediction  of  the  continuation  of  a  given 
sequence  by  reference  to  some  Turing  machine. 

Consequently,  the  problem  of  sequence  prediction  may  be  restated 
as:  “Find  a  simple  Turing  machine  that  can,  given  a  blank  tape,  enu¬ 
merate  the  sequence  o-  and  its  complete  continuation  within  a  given, 
required  ‘accuracy’.”  The  concepts  of  “simple”  and  “accuracy”  can  be 
given  mathematical  definitions  (e.g.,  see  Arbib,  1969,  p.  229).  We  may 
therefore  suppose  that  we  have  chosen  some  definitions.  Let  us  hold 
the  accuracy  required  of  our  prediction  at  a  constant  level  and  imagine 
looking  at  all  the  Turing  machines  (Tm’s)  that,  with  this  accuracy, 
predict  (enumerate)  o-  and  its  continuation.  Some  Tm’s  will  be  simpler 
than  others,  but  it  is  possible  that  more  than  one  Tm  will  have  the 
greatest  value  of  simplicity.  Thus,  there  may  be  many  predictions  for 
the  sequence  o-,  all  of  which  are  equally  valid.  We  should  therefore 
generalize  the  problem  of  sequence  prediction  and  state:  “Given 
0-,  find  the  set  of  most  simple  Turing  machines  that,  within  a  given  re¬ 
quired  accuracy,  predict  the  continuation  of  o-.” 
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Most  real-world  problems  of  sequence  prediction  cannot  be  solved 
very  easily  by  using  a  Turing  machine  formalization.  In  fact,  no  very 
good  formalization  (language)  for  sequence  prediction  in  real-world 
problems  has  yet  been  developed.  Aside  from  its  metaphorical, 
theoretical  relationship  to  subjects  like  the  theory  of  scientific  inquiry 
(see  Chapter  2),  there  has  been  some  question  as  to  the  relevance  of 
the  problem  of  sequence  prediction  to  practical  robotics  and  artificial 
intelligence.  To  quote  McCarthy  and  Hayes  (1969) : 

Imagine  a  person  who  is  correctly  predicting  the  course  of  a  football 
game  he  is  watching;  he  is  not  predicting  each  visual  sensation  (the 
play  of  light  and  shadow,  the  exact  movements  of  the  players  and 
the  crowd).  Instead  his  prediction  is  on  the  level  of:  team  A  is 
getting  tired;  they  should  start  to  fumble  or  have  their  passes  inter¬ 
cepted. 

Similarly,  attempts  to  use  numerical  sequence  prediction  techniques  to 
forecast  the  stockmarket  are  shortsighted  'unless  they  also  process 
information  about  the  multitude  of  events  in  the  real  world  which  can 
affect  the  market.  From  the  standpoint  of  ai  research,  a  more  relevant 
kind  of  sequence  prediction  to  investigate  would  be  the  prediction  of 
sequences  of  relational  structures.  The  problem  of  sequence  prediction 
also  occurs  in  ai  research  into  language  understanding,  where  it  may  be 
necessary  to  predict  the  next  word  or  phrase  in  a  sentence,  given  the 
preceding  words.  Here  the  prediction  must  be  made  relative  to  a  gram¬ 
mar  for  the  language  and  to  some  model  for  the  possible  meanings  of 
the  sentence.  Finally,  a  paper  by  Slagle  and  Lee  (1971)  shows  how 
game-tree  searching  techniques  can  be  applied  to  sequential  pattern 
recoghition. 

EXAMPLE  5-4.  RELATIVELY  PRIME  NUMBERS.  This  example  is 
similar  to  the  sunflower  pattern  discussed  in  Example  5-1.  Two 
integers  are  said  to  be  relatively  prime  if  and  only  if  they  have 
no  common  divisor  other  than  unity.  Thus,  4  and  9  are  relatively 
prime  because  the  divisors  of  4  are  1  and  2  and  the  divisors  of 
9  are  1  and  3.  Similarly,  12  and  21  are  not  relatively  prime  be¬ 
cause  both  can  be  divided  by  3.  Figure  5-2  shows  part  of  a 
pattern  (P  of  dots  in  the  plane  (here  the  dots  are  colored  white 
and  the  plane  is  colored  black),  which  has  the  following  pattern 
rule:  “A  dot  is  a  pattern  example  of  the  pattern  (P  if  and  only 
if  its  X  and  y  coordinates  are  relatively  prime  integers.”  Figure 
5-2  shows  all  those  dots  (pattern  examples)  of  (P  whose  integer 
coordinates  are  each  greater  than  or  equal  to  zero  and  less  than 


176 


INTRODUCTION  TO  ARTIFICIAL  INTELLIGENCE 


r _ j!??rT_  _  „ 


3I2‘E«SIH3a!B2IKii'5!E@tiS0'2!i3iai<i!5!5aiaa^ 

i>ilxt:iuxi:ctxiiiii<Jiy.fXVAiiitXiy.v^iy.iy.tyjy.i2iy.iy.tyjy.ii£ty.e^.iyjy.i’xry.i:<iyjy.rji 

2®lr?i3!iia!EK|Sigai|rag!S5i|iBi^3Rifj^i0i|i|friCiG^jairK^?^!RiE^^i5™r> 
XTXixjSixf  xi  xl5f5rxim5ixi  xEHHxi  Si  XM  wixrai  fT  I  Si  xrx  liHSi  7  i  i  ^/j  7  i  r  r^  ifox  w 
|ii!Hi!{Kiaai5iS§jpi!aEKi»2iSiEiSJ?SJ23i5iCKia?^SW;K(gt;!i3i^^Jl^^iEi!<I3(^3r 

siiiaiiakaiiiiOiBiaaKiEKiwaiiESiLyiiiip^j^wliii'Ii&Hija 

HSi5J3l«!S2!il3“aa2«!eii!3'aircK!£J¥!3!'6KI?-'^i!5Hf2fdl'- 
IK  IK  1 5«2!  S  JSSlil  2!  w ! « 2 !  SJ! ^ !  5 ‘ ‘  ^  ‘  ^ 

ixnciSxipj!hxKaxii?ixfxr?Hxixlxrxi^xjKi«rxiCTxixix^xixixixr”.ixlxi5 

!E}fiKlgi3i|SIS}fiBiSi|[§i|l3Kip||yi5!E«;!^i3iW?J^SiSra!K2W 


l^j^2 


14^  jx  f  PJx  i  ^  I  ^  j  ^4^  j 


I 


Figure  5-2.  The  relatively  prime  integers  from  0  to  256. 

or  equal  to  256.  The  figure,^  to  quote  Reichardt  (1971),  “shows 
the  intriguing  combination  of  regularity  and  randomness  which 
characterizes  the  distribution  of  prime  numbers  and  the  property 
of  joint  divisibility.” 

EXAMPLE  5—5.  WORDY  EYE.  The  Frontispiece  to  this  chapter  is  a 
picture  that  contains  pattern  examples  of  at  least  five  patterns: 
the  letters  of  the  English  alphabet;  the  words  of  the  English 
language;  the  sentences  of  the  English  language;  the  sequence 

^  The  Frontispiece  to  this  chapter  and  Fig.  5-2  are  reprinted  with  permission 
from  the  computer  artwork  of  M.  R.  Schroeder;  copyright  <c)  Bell  Laboratories, 
1973. 
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formed  by  repeating  ONE  PICTURE  IS  WORTH  A  THOU¬ 
SAND  WORDS;  and  the  set  of  pictures  that  depict  a  human 
eye.  This  picture  nicely  illustrates  the  hierarchical,  structural 
nature  of  many  patterns.  A  system  for  understanding  patterns 
in  the  ieaL  world  must  be  capable  of  dealing  with  the  ways  in 
which  patterns  can  be  made  up  of  patterns.  Thus,  we  may 
choose  to  state  a  pattern  rule  for  the  letter  A  as  follows :  An 
object  is  a  pattern  example  of  A  if  it  is  made  up  of  an  object 
that  is  a  pattern  example  of  the  “upward-angle”  pattern  and  an 
object  that  is  a  pattern  example  of  the  “horizontal-line”  pattern, 
and  these  two  objects  are  related  to  each  other  in  a  certain  way. 
Our  discussion  of  vision  systems  will  trace  a  hierarchy  of  pat¬ 
terns  (point,  line,  curve,  region,  texture,  .  .  .  ,  object,  scene) 
which  should  be  recognized  by  machines  that  can  see.  Especially 
relevant  in  this  regard  is  the  explication  of  “hierarchical  syn¬ 
thesis”  given  by  Barrow  etal.  (1972). 

EXAMPLE  5-6.  SALT  AND  PEPPER  SHAKERS.  Mr.  and  Mrs.  Jones 
of  A.D.  2100  are  eating  a  quiet  dinner  at  home.  Mrs.  Jones  de¬ 
cides  her  fried  seaweed  is  not  salty  enough  and  reaches  for  the 
saltshaker,  only  to  discover  that  the  table  has  been  inadequately 
set,  and  there  is  no  saltshaker  on  it.  “Robbie,”  she  calls,  “would 
you  bring  us  the  saltshaker?”  Robbie  the  Robot  floats  into  the 
kitchen  and  proceeds  to  look  for  a  saltshaker.  It  finds  two  ob¬ 
jects,  each  of  which  might  be  a  saltshaker  (they  are  the  right 
shape  and  size),  but  they  are  each  opaque— the  robot  can’t 
see  their  contents.  Looking  more  closely  at  the  objects,  Robbie 
notices  that  there  are  holes  in  the  top  of  one  of  the  objects  and 
that  these  holes  are  placed  so  as  to  form  a  pattern  example  of 
the  letter  S.  Robbie  therefore  takes  this  object  to  the  dinner 
table.  By  this  time  Mrs.  Jones  also  wants  the  peppershaker  and 
Robbie,  having  been  too  literal-minded  (but  next  year’s  models 
will  be  better  .  .  .),  must  go  back  to  the  kitchen.  However,  it 
has  successfully  recognized  a  pattern  example  of  the  pattern 
“saltshaker.” 

EXAMPLE  5-7.  EXTRATERRESTRIAL  PLANETARY  EXPLORATION. 
Let  us  suppose  that  a  team  of  robots  is  conducting  a  slow,  but 
patient,  geological  exploration  of  the  Moon.  Because  of  the 
time  lag  in  communications  from  Earth,  the  robots  form  a 
largely  self-directing  collection  of  machines.  In  fact,  each  robot 
is  somewhat  independent  of  the  others  because  they  are  too 
thinly  distributed  about  the  Moon’s  surface  to  be  in  frequent 
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contact  with  each  other.  One  of  the  robots,  M65,  is  safely 
navigating  a  narrow  path  between  two  craters  when  a  moon- 
quake  sends  it  sliding  out  of  control  over  the  edge  of  the  path 
and  down  the  slope  of  one  of  the  craters.  M65  arrives  intact 
but  disoriented  at  the  crater  bottom.  It  doesn’t  know  precisely 
where  it  is  or  where  to  go  next.  The  caterpillar-treaded  robot 
crawls  back  up  the  slope  of  the  crater.  Reaching  the  edge,  M65 
takes  a  panoramic  picture  of  its  surroundings,  and  generates  a 
description  of  the  scene’s  major  details  (shape  and  placement 
of  pattern  examples  of  the  patterns  ‘‘mountain,”  “large  boulder,” 
“crater,”  etc.)  It  compares  this  description  to  another  descrip¬ 
tion  that  it  had  generated  of  its  surroundings  shortly  before 
the  moonquake  occurred.  Noting  some  similarities,  it  attempts 
to  reestablish  its  old  position  and  orientation  and  to  proceed 
with  its  business. 

Examples  5-6  and  5-7  are,  of  course,  entirely  fanciful  and  beyond 
the  current  state-of-the-art  in  ai  research.  Indeed,  for  Robbie  the  Robot 
to  behave  as  it  did  in  Example  5-6,  it  would  have  to  be  able  to  solve 
the  problems  of  recognizing  and  understanding  human  speech,  which 
are  at  least  as  difficult  as  simply  recognizing  and  distinguishing  salt 
and  pepper  shakers.  Similarly,  the  techniques  necessary  for  robot  M65 
to  “reestablish  its  orientation”  and“navigate  successfully”  over  long 
distances  of  lunar  terrain  (without  human  assistance)  may  not  be 
available  for  a  few  decades.  However,  it  should  be  noted  that  ai  re¬ 
searchers  have  made  serious  proposals  that  artificial  intelligence  tech¬ 
niques  be  used  to  construct  machines  that  could  carry  out  less  pre¬ 
tentious,  but  still  somewhat  self-directing,  explorations  on  Mars  (see 
McCarthy,  1964a;  Glaser,  McCarthy,  and  Minsky,  1964). 

EXAMPLE  5-8.  RECOGNIZING  A  CUBE.  Succeeding  sections  will 
discuss  techniques  for  using  a  television  picture  of  a  scene  to 
produce  a  line  drawing  of  the  scene.  Suppose  the  scene  con¬ 
tains  only  a  cube  resting  on  an  unknown  surface.  We  might  then 
obtain  a  line  drawing  something  like 
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Now  this  line  drawing  is  a  simple  description  of  the  origi¬ 
nal  television  picture  of  the  scene.  The  line  drawing  itself  may  be 
considered  as  an  object,  however,  and  techniques  that  recognize 
some  line  drawings  as  being  examples  of  the  pattern  “descrip¬ 
tions  of  a  cube”  can  be  considered.  Thus,  depending  on  the 
orientation  of  the  camera  with  respect  to  the  cube  and  the 
surface,  any  one  of  an  infinite  number  of  line  drawings  might 
be  obtained  that  would  describe  a  pattern  example  of  the  pat¬ 
tern  “cube.”  We  recognize  each  of  these  as  belonging  to  a 
pattern  different  from  that  to  which  the  line-drawing  below 
belongs. 


As  is  illustrated  by  the  last  example,  most  pattern-recognition  pro¬ 
grams  really  work  with  descriptions  of  things  rather  than  with  the 
actual  things  themselves.  Thus,  to  find  pattern  examples  of  various 
patterns  (“cubes,”  “boxes,”  etc.)  in  a  real-world  environment,  the 
computer  will  typically  make  use  of  a  television  camera  picture  of  that 
environment.  This  picture  constitutes  its  initial  description  of  the  en¬ 
vironment.  The  initial  description  may  be  processed  to  yield  other 
descriptions  of  the  environment,  or  of  parts  of  the  environment,  and 
these  descriptions  may  be  recognized  as  “descriptions  of  a  cube,”  “de¬ 
scriptions  of  a  box,”  etc.  The  computer  may  then  print  out  that  it  has 
found  a  pattern  example  of  the  pattern  “cube”  in  the  environment;  if 
necessary,  it  may  use  its  descriptions  to  help  guide  a  mechanical  arm 
that  would  attempt  to  pick  up  the  pattern  example  of  “cube”  that  was 
found.  Of  course,  when  the  computer  does  so,  it  may  find  that  its  de¬ 
scriptions  are  incorrect. 

It  is  usually  possible  to  describe  a  given  object  in  many  different 
ways.  The  kind  of  description  one  uses  will  in  general  depend  upon 
the  problem  at  hand.  The  major  kinds  of  descriptions  that  are  cur¬ 
rently  used  by  pattern  processing  systems  all  have  the  structural  char¬ 
acteristics  of  vectors,  matrices,  strings,  lists,  and  graphs. 
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EXAMPLE  5-9,  PATTERN  MATCHING  AND  TEMPLATES.  One  rep¬ 
resentation  that  has  been  developed  for  stating  pattern  rules 
having  each  of  these  five  kinds  of  structure  is  the  use  of  tem¬ 
plates  in  pattern  matching.  An  early  example  of  this  technique 
was  presented  by  Uhr  and  Vossler  (1963),  who  described  a 
program  that  successfully  generated  its  own  set  of  template 
matrices,  which  it  used  to  recognize  handprinted  characters. 
Similarly,  in  Chaper  4  we  discussed  the  work  of  Koffman 
(1967)  and  Citrenbaum  (1972),  who  presented  programs  that 
could  develop  and  use  templates  with  the  structure  of  graphs  to 
play  positional  games.  Most  of  the  recent  programming  lan¬ 
guages  for  Ai  research  make  extensive  use  of  templates  with  the 
structure  of  lists  for  pattern  matching:  pattern  matching  in  this 
case  means  locating  subexpressions  in  a  larger  expression  or 
data  base  (set  of  expressions),  and  perhaps  naming  the  located 
subexpressions  by  assigning  them  as  values  to  variables.  As  an 
example,  we  shall  briefly  describe  the  pattern  matching  language 
used  in  the  Ai  programming  language  qa4  (Rulifson,  Derksen, 
and  Waldinger,  1972). 

In  this  language,  a  pattern  rule  can  be  any  list  expression 
that  is  correctly  made  up  of  atoms,  variables,  and  certain  “pat¬ 
tern  operators”  defined  for  qa4.  Intuitively,  two  expressions 
match  if  their  elements  have  the  same  values,  at  all  levels.  Thus, 
an  atom  (essentially,  an  alphanumeric  string)  is  treated  as  a 
constant,  and  normally  will  only  match  another  instance  of  it¬ 
self;  if  an  atom  is  to  be  treated  as  a  variable  it  must  have  one 
of  six  possible  variable  prefixes:  <r-,  ?,  $,  ??,  and  $$.  The 

first  three  prefixes  restrict  the  variable  to  match  only  individual 
terms  (expressions),  while  the  second  three  allow  variables  to 
match  “fragments,”  or  segments  of  lists.  Thus,  X,  /1 1263,  and 
ATOM  are  constant  atoms  (when  they  occur  in  a  pattern  rule); 
?y,  $Z,  and  IT  are  variables  restricted  to  individual  terms; 

??^l,  and  $$C  are  fragment  variables.  The  prefix  <— 
permits  a  variable  to  match  any  individual  term,  regardless  of 
the  variable’s  previous  value,  and  specifies  that  after  the  match 
the  variable  will  have  as  its  value  the  term  it  is  matched  against. 
The  prefix  ?  allows  a  variable  to  match  only  its  previous  value, 
if  any  (qa4  allows  variables  to  not  have  values)  ;  otherwise  it 
is  allowed  to  match  any  individual  term,  and  acquire  that  term 
as  its  value.  Finally,  the  prefix  $  allows  a  variable  to  match  only 
its  previous  value;  if  the  variable  does  not  have  a  value  initially. 
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then  it  is  not  allowed  to  match  anything.  The  three  double¬ 
character  prefixes  have  analogous  meanings  to  those  of  their 
single-character  counterparts,  except  that  they  restrict  variables 
to  match  only  fragments  of  lists. 

Thus,  the  expression  (^X  (7Y  $Z)  2  is  a  pattern 

rule  in  the  qa4  language;  a  wide  variety  of  expressions  will 
satisfy,  or  match  this  pattern  rule  (template),  given  that  the 
variables  X,  T,  Z,  and  W  have  the  proper  initial  values,  where 
required.  Some  expressions  which  might  match  this  pattern 
-  rule  are  {PLUS  (SIN  A)  2  4.5  PI)  and  ((THE  AMERICAN 
CONGRESS)  (HAS  EXACTLY)  2  HOUSES).  Other  expres¬ 
sions  cannot  match  this  pattern  rule,  regardless  of  the  initial 
values  of  its  variables:  examples  of  such  expressions  are 
(TIMES  2  3)  and  ((AN  OUT)  REQUIRES  3  STRIKES).  It 
should  be  clear  how  this  language  allows  a  pattern  rule  to  specify 
the  structural  nature  of  the  pattern  examples  which  satisfy  it. 

Among  the  special  operators  which  further  extend  this 
capability  are  .  .  ,  P/4AD,  and  POR.  If  the  subexpression  .  .  pat 
occurs  in  a  pattern  rule  (where  pat  is  itself  a  pattern  rule),  then 
this  subexpression  matches  an  argument  expression  if  pat 
matches  some  subexpression  of  that  argument,  perhaps  the  en¬ 
tire  argument  itself.  Thus,  if  the  initial  values  of  the  variables 
X  and  Y  are  C  and  D,  respectively,  then  the  pattern  rule 
(..  $X  ..$y)  matches  the  expression  ((A  B  C)  D).  The 
operators  PAND  and  POR  allow  pattern  rules  to  make  use  of 
logical  combinations  of  pattern  rules.  A  pattern  rule  of  the  form 
(PAND  patl  .  .  .  patn)  matches  an  expression  if  and  only  if 
that  expression  matches  all  the  pattern  rules  horn:  patl  through 
patn.  Similarly,  a  pattern  rule  of  the  form  (POR  patl  .  .  .  patn) 
is  satisfied  by  an  expression  if  and  only  if  that  expression 
matches  at  least  one  of  the  pattern  rules  from  pa?/  through  patn, 
Thus,  the  pattern  rule  (PAND  ^X  (TUPLE  1  ^Y))  matches 
the  expression  (TUPLE  1  2),  assigning  X  the  expression 
(TUPLE  1  2)  as  its  value,  and  making  2  the  value  of  Y. 

This  kind  of  pattern  matching  language  has  been  useful  in  many 
ways,  perhaps  most  notably  as  an  interlingua  (intermediate  language) 
for  question-answering  systems.  For  example,  Winogr ad’s  English  un¬ 
derstanding  program  (see  Chapter  7)  demonstrated  how  a  wide  variety 
of  English  questions  can  be  translated  into  planner,  theorems  (see 
Chapter  6)  that  can  use  such  pattern  rules  to  represent  the  ‘‘essential 
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unknowns”  of  their  respective  questions.  Similarly,  the  English  question 
“Why  did  the  chicken  cross  the  road?”  might  be  translated  into  a  qa4 
expression  like: 

{AND  {EXISTS  {CHICKEN  ?F)) 

{EXISTS  {ROAD  ?Z)) 

{EXISTS  {CROSS  ?Y  ?Z  7EVENT)) 

{EXISTS  {CAVSE  IX  WVENT)) 

Evaluation  of  this  expression  will  cause  a  search  of  the  current  data 
base  (set  of  expressions  that  a  qa4  program  may  treat  as  assertions 
about  the  world)  for  expressions  matching,  successively,  the  pattern 
rules  {CHICKEN  lY),  {ROAD  IZ),  {CROSS  lY  IZ  lEVENT),  and 
{CAUSE  IX  lEVENT).  The  pattern  matching  facilities  within  such 
programming  languages  as  planner,  qa4,  and  conniver  (see  cita¬ 
tions  in  the  Bibliography  under  Hewitt,  Rulifson,  and  Sussman)  provide 
one  of  the  most  general  formalizations  for  pattern  processing  yet  de¬ 
veloped  by  Ai  researchers.  This  generality  derives  from  the  utility  of 
storing  symbolic  data  in  list  structures,  the  expressiveness  of  the  pattern 
rule  notation  for  describing  list  structures,  and  the  fact  that  the  for¬ 
malization  of  these  systems  does  not  require  the  use  of  any  specific 
terminology  or  facts  associated  with  particular  real-world  pattern- 
perception  problems. 

In  closing  this  section,  reference  should  also  be  made  to  an  earlier, 
but  still  very  general  group  of  formalizations  for  pattern  processing  sys¬ 
tems,  which  includes  perceptrons  and  statistical  decision  theoretic  pat¬ 
tern  recognition  (note  5-1).  There  is  not  space  here  to  discuss  these 
topics  but,  fortunately,  excellent  summaries  of  them  are  given  in  the 
books  by  Minsky  and  Papert  (1969),  Duda  and  Hart  (1973),  and 
Mendel  and  Fu  (1970).  In  Chapter  7,  the  topic  of  statistical  decision 
theoretic  pattern  recognition  is  briefly  discussed  in  comparison  with  the 
grammatical  inference  approach  to  pattern  recognition. 


EYE  SYSTEMS  FOR  COMPUTERS 

The  most  basic  part  of  a  computer  system  that  performs  visual 
pattern  perception  is  the  eye  system,  which  is  simply  the  collection  of 
computer  eyes  that  it  can  control,  and  from  which  it  can  receive  in¬ 
formation.  A  computer  eye  is  a  device  for  producing  descriptions  of  the 
electromagnetic  radiation  in  space.  In  general,  such  an  eye  consists  of  a 
sensor,  optics,  and  usually  an  illuminator  (Earnest,  1967).  The  purpose 
of  the  illuminator  is  to  direct  electromagetic  radiation  into  the  environ- 


Environment 
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/ 


Figure  5-3.  Artificial  eyes. 
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ment,  the  purpose  of  the  sensor  is  to  receive  electromagnetic  radiation 
from  the  environment,  and  the  purpose  of  the  optics  is  to  process  the 
radiation,  either  as  it  leaves  the  illuminator  or  as  it  enters  the  sensor. 
The  sensor  describes  the  electromagnetic  radiation  that  it  receives,  by 
converting  it  into  an  electric  signal  that  can  be  stored  and  processed  as 
data  by  the  computer.  Optics  serve  to  change  the  radiation  received  from 
the  environment  by  the  sensor,  so  that  typically  a  given  sensor  can  de¬ 
scribe  different  views  of  its  environment  without  itself  being  moved.  For 
ordinary  light  (as  distinguished  from  infrared,  ultraviolet,  etc,),  the 
optics  will  usually  be  a  movable  collection  of  shutters,  filters,  lenses, 
mirrors,  and  prisms. 

AI  research  has  so  far  given  primary  attention  to  two  types  of 
artificial  eye,  known  as  imaging  eyes  and  jumping  (or  flying)  spot  eyes 
(Earnest,  1967).  Figure  5-3  shows  diagrams  for  these  types  of  eyes. 
The  jumping-spot  eye  makes  use  of  an  illuminator  (often  a  laser)  that 
is  capable  of  putting  out  a  very  narrow  beam  of  light.  The  optics  of  the 
jumping-spot  eye  cast  the  beam  in  different  directions  throughout  the 
environment.  The  sensors  (it  is  desirable  to  use  several)  of  the  eye 
receive  radiation  from  the  beam  that  is  reflected  back  by  the  environ¬ 
ment.  The  total  amount  of  radiation  received  by  the  sensors  is  compared 
with  the  total  amount  of  radiation  emitted  by  the  illuminator,  to  yield  a 
score  for  the  “reflectivity”^  of  the  environment  in  each  direction  that  is 
illuminated.  The  initial  description  of  the  environment  that  is  produced 
by  the  jumping-spot  eye  corresponds  simply  to  a  list  of  directions  and 
their  reflectivities.  This  list  is  coded  for  use  by  the  computer  as  a 
sequence  of  electric  signals. 

When  compared  to  other  types  of  artificial  eyes,  jumping-spot  eyes 
appear  to  offer  many  advantages  (such  as  the  natural  development  of  a 
visual-light  frequency  radar,  or  “lidar”),  but  some  disadvantages  (e.g., 
mechanical  problems  connected  with  the  use  of  ordinary  mirrors,  prisms, 
etc.,  in  the  optics  of  such  an  eye  may  make  it  difficult  to  scan  large 
scenes  at  rates  faster  than  five  frames  per  second,  thus  hampering  the 
analysis  of  motion  in  scenes). 

Most  AI  research  on  visual  perception  has  been  concerned  with 
the  use  of  imaging  eyes.  (See  Figure  5-4.)  An  imaging  eye  is  basically 
the  reverse  of  a  jumping-spot  eye;  instead  of  several  sensors  and  one 
illuminator,  an  imaging  eye  has  one  sensor  (typically  a  television  cam- 


®  The  proper  physical  term  to  describe  the  reflecting  ability  of  a  material 
surface  is  reflectance.  The  light  received  by  a  sensor  in  a  jumping-spot  eye  is  not 
really  a  measure  of  the  reflectance  of  any  one  material  surface,  since  it  may  de¬ 
pend  on  the  placement  of  many  objects  in  space. 


Figure  5-4.  A  computer-controlled  television  camera.  (Courtesy  of  Karl 
Pingle  and  Lynn  Quam,  Stanford  Al  Project.) 

era)  and  often  has  several  illuminators.  Thus,  the  sensor  used  in  an 
imaging  eye  is  usually  more  complex  than  those  in  a  jumping-spot  eye. 
The  optics  in  an  imaging  eye  generally  control  the  way  light  is  directed 
into  the  sensor  rather  than  out  of  the  illuminators.  With  proper  use  of 
the  optics  in  an  imaging  system,  the  pictures  produced  by  the  eye  can 
be  “focused,”  “magnified,”  “zoomed,”  etc.  A  picture  produced  by  an 
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imaging  eye  may  be  described  as  a  large  matrix,  with  each  element  of 
the  matrix  being  a  number  measuring  the  intensity  of  light  in  a  given 
volume  of  space.  When  a  picture  matrix  is  produced  by  an  imaging 
eye,  it  is,  again,  coded  for  use  by  the  computer  as  a  sequence  of 
electric  signals.  Generally,  a  picture  matrix  produced  by  an  imaging 
eye  will  contain  less  than  100,000  elements  (in  contrast  to  approxi¬ 
mately  300  million  rods  and  cones  in  the  retina  of  the  human  eye). 
Figure  5-5  shows  an  example  picture  of  a  real-world  scene  of  fairly 


Figure  5-5.  Picture  of  a  real-world  scene  produced  by  the  computer- 
controlled  television  camera  shown  in  Figure  5-4. 
(Courtesy  of  Karl  Pingle,  Stanford  Al  Project.) 


simple  objects,  produced  by  an  imaging  eye  at  the  Stanford  Artificial 
Intelligence  Project. 

Imaging  eyes  have  the  advantages  that  their  illumination  require¬ 
ments  are  roughly  compatible  with  those  necessary  for  humans  and 
that  their  optical  systems  have  been  already  extensively  developed 
for  use  in  ordinary  photography.  Furthermore,  there  is  no  difficulty  in 
using  imaging  systems  to  make  motion  pictures  of  scenes. 
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SCENE  ANALYSIS 

Picture  Enhancement  and  Line  Detection 

This  section  discusses  techniques  that  can  be  used  by  computers 
for  the  analysis  of  pictures.  As  in  the  preceding  section,  a  picture  is  con¬ 
sidered  to  be  a  large  matrix  of  numbers,  each  number  representing  the 
intensity  of  light  in  a  portion  of  space.  The  total  portion  of  space 
described  by  a  picture  will  be  referred  to  as  a  scene.  Our  primary  con¬ 
cern  is  to  show  how  a  computer  can  analyze  a  single  picture  of  a  given 
scene.  Techniques  for  analyzing  and  comparing  several  pictures  of  the 
same  scene  are  described  in  Quam  (1971)  and  Duda  and  Hart  (i973). 
The  techniques  we  discuss  can  be  grouped  into  three  classes:  “picture 
enhancement  and  line  detection,”  “perception  of  regions,”  and  “per¬ 
ception  of  objects.” 

Picture-enhancement  techniques  are  methods  for  using  one  picture 
to  produce  another.  When  used  correctly,  they  can  be  of  help  iii  dis¬ 
covering  significant  details  in  a  picture.^  However,  because  the  picture 
that  results  from  the  use  of  such  a  technique  usually  has  less  information 
content  than  the  original  picture,  picture  enhancement  techniques  cur¬ 
rently  seem  to  be  of  more  value  to  human  photographers  than  they  are 
to  computer  vision  systems.  Some  relatively  simple  picture-enhancement 
techniques  will  be  presented  here,  and  the  reader  is  referred  to  Duda  and 
Hart  (1973)  for  a  discussion  of  other,  more  complex  methods. 

One  of  the  simplest  picture-enhancement  techniques  is  that  of 
noise  ''removal/'  or  smoothing.  Usually,  in  developing  a  picture-matrix 
description  of  a  scene,  some  noise  will  be  picked  up,  causing  various 
elements  of  the  matrix  to  deviate  from  their  correct  value.  If  the  noise  is 
random,  such  that  noise  in  adjacent  elements  of  the  picture  matrix  is 
uncorrelated,  then  a  spatial  averaging  or  smoothing  technique  may  be 
applied  to  reduce  it.  This  technique  consists  simply  of  resetting  the  value 
of  each  element  of  the  picture  matrix  to  be  the  average  of  the  old  values 
of  the  picture  elements  in  a  “window”  surrounding  it.  To  illustrate, 
suppose  we  smooth  the  picture  matrix 

1  0  2  5  7 

0  2  4  5  6 

9  2  8  4  7 

9  2  7  5  7 

6  15  3  6 

■^Quam  (1971,  pp.  78,  101)  shows  how  picture-enhancement  techniques 
were  used  to  detect  a  cloud  on  Mars,  which  would  probably  not  have  been 
recognized  without  the  use  of  these  techniques.  (Also  see  Leovy  et  al.,  1971.) 
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using  3x3  windows.  Then  the  picture  element  that  has  value  8  will  be 
reset  to  have  the  value  4.3.  Smoothing  a  picture  usually  introduces  some 
“blurring”  in  the  picture  matrix  that  is  produced. 

Another  technique  for  noise  removal  consists  of  finding  each  pic¬ 
ture  element  that  differs  greatly  from  a  surrounding  set  of  approximately 
equivalent  picture  elements,  and  then  replacing  it  with  their  average 
value.  In  the  example  displayed  above,  this  technique  might  give  a  new 
value  of  6  to  the  element  that  has  value  (intensity)  3.  This  technique 
is  often  referred  to  as  salt-and-pepper  removal,  and  has  the  advantage 
that  it  will  usually  reduce  most  of  the  random  shot  noise  in  a  picture 
without  causing  the  “blurring”  created  by  smoothing.® 

Contouring,  or  isodifference  detection,  is  often  used  in  terrestrial 
map  making  to  emphasize  lines  of  constant  altitude.  The  technique  con¬ 
sists  of  establishing  a  sequence  of  brightness  levels, 

^0<^l<^2<  •  •  •  <0n 

for  which  each  picture-element  Pij  in  a  given  matrix  has  an  intensity 
lij  such  that  ~  Uj  <  ^or  some  Each  picture  element  is  then 
given  a  new  intensity  value  corresponding  to  the  appropriate  0^. 

Edge  enhancement,  or  sharpening,  of  a  picture  will  produce  a  new 
picture  similar  to  that  obtained  by  contouring.  In  the  edge  enhancement 
of  a  picture  only  those  picture  elements  that  separate  elements  of  greatly 
varying  intensity  are  shown.  For  each  picture-element  Pij  with  intensity 
value  lij  of  the  matrix,  we  compute  the  “cross  operator” 

Rij  =  ( (4i  -  li+ij+iY  +  (4.+1  -  h+ijr)’^ 

We  then  form  the  new  picture  matrix  with  elements  Pij  that  have  in¬ 
tensity  Pij  =  1  it  Rij  —  where  ^  is  some  threshhold  value  and  lij  —  0 
otherwise.  The  threshhold  value  0  determines  how  greatly  the  intensity 
must  vary  in  order  to  show  a  given  picture  element.  (See  Roberts, 
1963.) 

Other  techniques  developed  for  picture  enhancement  make  use  of 
spatial  frequency  analysis  and  Fourier  transforms.  These  are  well  ex¬ 
plained  by  Duda  and  Hart  (1973).  Figures  5-6  and  5-7  illustrate  the 
power  of  these  techniques  applied  to  a  picture  of  the  Martian  moon 
Phobos,  taken  by  Mariner  9. 

Line-detection  techniques  are  methods  for  finding  significant  curves 
in  a  picture  matrix  that  can  be  used  to  produce  a  line  drawing.  The 
problems  of  making  a  good  program  for  line  detection  in  pictures  are 
significant  and  still  largely  unsolved.  The  value  of  h^yin^. such  a  pro- 
fifram  is  great,  however,  as  the  reader  will  see  from  the  discussions  in 

®  Quam  ( 1971 )  referred  to  this  technique  as  “Custering,”  after  General 
Custer  (U.S.  Army)  who  was  defeated  when  surrounded  by  Indians. 


Figure  5-6.  (Top)  Original  picture  of  Martian  moon  Phobos,  taken  by 
Mariner  9.  (Bottom)  High-pass  spatial  frequency  filtering  of  the  original. 
(Courtesy  of  Lynn  Quam  and  Robert  Tucker,  Stanford  Al  Project  and 
Jet  Propulsion  Laboratory.) 


Figure  5-7.  (Left)  Custering  of  the  original  picture  of  Phobos  (see  Fig.  5-6).  (Right)  High-pass  filtering  of  the  custered  pic¬ 
ture.  (Courtesy  of  Lynn  Quam  and  Robert  Tucker,  Stanford  A1  Project  and  Jet  Propulsion  Laboratory.) 
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this  chapter  on  “identification  of  objects”  and  “learning  to  recognize 
structures  of  simple  objects.”  The  edge-enhancement  technique  de¬ 
scribed  above  is  one  simple  type  of  line-detection  program.  Another 
simple  method  for  detection  of  lines  is  based  on  the  use  of  coincidence 
predicates.  A  simple  version  of  this  method  is  the  following:  For  a  given 
picture  matrix  with  elements  Pij,  having  intensity- values  /^j,  form  a 
new  picture  matrix  with  elements  P'u  having  intensity-values  where 
Pij  =  1  if  (4;  -  h+ij)  and  (4j+i  -  h+ij+i)  are  both  large  and  of 
the  same  sign,  or  if  (4^-  —  4;+i)  large 

and  of  the  same  sign,  where  “large”  is  determined  by  the  specification 
of  some  threshhold  value.  Other  methods  for  line  detection  have  been 
investigated  by  Heuckel  (1969),  Herskovits  (1970),  Griffith  (1970), 
Kelly  (1970a, b),  Montanari  (1971),  Hayes  and  Rosenfeld  (1972), 
and  many  others.  Figure  5-8  shows  a  computer-produced  line  drawing 
of  a  real-world  scene  like  that  shown  in  Figure  5-5.  This  figure  illus- 


Figure  5--8.  Line  drawing  of  a  real-world  scene  produced  from  a  tele¬ 
vision  picture  like  the  one  in  Figure  5-5.  (Courtesy  of  Karl  PIngle, 
Stanford  A1  Project.) 
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trates  some  of  the  problems  that  currently  plague  attempts’ to  develop 
good  line-detection  programs.  It  is  difficult  to  develop  programs  that 
can  overlook  “meaningless”  variations  in  light  intensity  (e.g.,  shadows) 
and  still  detect  “meaningful”  ones  (e.g.,  the  actual  boundaries  and 
edges  of  objects). 

Perception  of  Regions 

Given  a  line  drawing,  a  vertex  can  be  defined  as  a  point  where  two 
or  more  lines  meet,  and  a  region  as  an  area  of  the  picture  that  is  entirely 
enclosed  by  lines  (and  usually  contains  no  lines).  The  problems  in¬ 
volved  in  finding  and  identifying  “meaningful”  regions  in  a  picture  are 
similar  to  those  for  identifying  lines,  and  are  still  somewhat  unsolved. 

As  might  be  expected,  several  researchers  have  investigated  the 
use  of  local  operators  to  detect  regions  in  a  picture,  similar  to,  but  not 
requiring,  the  use  of  local  operators  to  detect  lines,  as  discussed  above. 
Brice  and  Fennema  (1970)  present  a  good  description  of  a  vision 
system  following  this  approach.  The  study  of  such  operators  has  led 
to  many  abstract  results  in  digital  topology,  that  may  be  of  interest  to 
the  reader  (e.g.,  see  Rosenfeld,  1973). 

Perhaps  the  most  intuitive  method  for  recognizing  and  describing 
regions  is  to  make  use  of  line  detection  programs  to  find  lines  in  the 
picture,  use  one  of  many  possible  algorithms  for  locating  vertices,  and 
then  trace  along  the  lines  and  vertices  searching  for  closed  curves.  (A 
closed  curve  is  a  sequence  of  connected  lines  leading  back  to  the  first 
line  in  the  sequence.)  Each  closed  curve  is  part  of  the  boundary  of  a 
region,  and  the  shape  of  the  region  can  be  described  in  terms  of  the 
lines  and  vertices  that  enclose  it.  This  technique  is  suggested  by  Winston 
(1970)  for  recognizing  regions  with  geometric  shapes  in  a  line  drawing; 
Winston’s  general  approach  is  described  in  the  next  section. 

An  interesting  technique  for  describing  the  shape  of  a  region  is 
medial  axis  transformation,  or  prairie  fire  analysis.  Given  a  region  such 
as  shown  here,  we  may  imagine  that  the  interior  of  the  region  is  covered 
with  highly  flammable  grass  and  the  exterior  of  the  region  is  empty 
(presumably  covered  with  asphalt).  Suppose  we  simultaneously  light 
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a  fire  all  along  the  boundary  of  the  region.  The  fire  will  then  spread 
inward  and  be  quenched  where  it  meets  itself.  Each  point  where  two  or 
more  fire  fronts  meet  and  quench  each  other  is  known  as  a  quench 
point.  The  collection  of  quench  points  for  our  example  will  look  some¬ 
thing  like  the  next  drawing.  This  collection  of  quench  points,  or  skele^ 


ton,  may  be  taken  as  a  description  of  the  shape  of  all  regions  that  will 
produce  it.  (The  precise  initial  region  may  be  constructed  if  some  ad¬ 
ditional  information  is  given.)  Duda  and  Hart  (1973)  discuss  this  and 
other  methods  of  region  recognition  and  description  in  greater  detail. 

Perception  of  Objects 

Historically,  the  first  program  to  successfully  use  vision  to  recog¬ 
nize  objects  in  an  environment  was  written  by  Roberts  (1963).  This 
program  used  local  operators  to  transform  a  digitized  picture  into  a 
line  drawing,  which  was  then  searched  for  vertices  and  regions.  Relevant 
information  about  each  line,  vertex,  and  region  would  be  computed 
and  stored  in  a  list  structure;  e.g.,  each  vertex  would  have  associated 
with  it  a  description  of  the  regions  surrounding  it.  The  program  was 
given  a  set  of  similar  list  structures  that  presented  the  same  kind  of 
information  about  each  of  the  edges,  vertices,  and  surfaces  of  the  three 
basic  objects  it  could  recognize  (cubes,  wedges,  and  hexagonal  prisms). 
The  program  would  attempt  to  make  a  preliminary,  consistent  matching 
of  each  vertex,  line  and  region  of  the  line  drawing  against  a  correspond¬ 
ing  element  in  one  of  these  three  objects.  Given  this  matching,  the  pro¬ 
gram  would  compute  the  projective  geometry  transformation  that  would 
yield  the  best  fit  between  each  portion  of  the  line  drawing  and  the 
object  to  which  it  had  been  corresponded— with  a  good  enough  fit,  the 
object  would  be  “recognized”  as  having  produced  that  portion  of  the 
line  drawing.  Roberts’  program  was  able  to  recognize  compound  ob¬ 
jects,  made  by  piecing  together  transformations  of  cubes,  wedges,  and 
hexagonal  prisms. 

Guzman  ( 1968a, b)  made  the  next  significant  advance  in  visual 
perception  by  machines.  He  wrote  the  first  program  which  did  not  re- 
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Figure  5-9.  (Guzman,  1968,  reprinted  with  permission.) 

quire  stored  descriptions  of  the  objects  it  could  recognize,  and  did  not 
proceed  by  trying  to  match  such  descriptions  against  line  drawings  of 
the  scene.  Given  a  picture  such  as  that  in  Fig.  5-9,  in  which  the  lines 
and  vertices  have  been  detected  and  correctly  labeled  and  in  which  the 
regions  of  the  picture  have  been  numbered  as  indicated,  Guzman’s  pro¬ 
gram  (called  see)  will  identify  12  objects,  as  indicated  in  Table  5-1. 


TABLE  5—1.  Identification  of  Objects  by  see 


Object 

Regions 

Object 

Regions 

1. 

3,2,1 

1. 

25,23,22 

2. 

32,33,27,26 

8, 

14,13,15 

3. 

28,31 

9, 

10,16,11,12 

4. 

19,20,34,30,29 

10. 

18,9,17 

5. 

36,35 

11. 

7,8 

6. 

24,5,21,4 

12. 

38,37,39 

What  is  most  impressive  about  see  is  that  it  can  make  this  identifi¬ 
cation  without  knowing  anything  in  detail  about  specific  polyhedra  or 
about  what  to  expect  in  Fig.  5-9.  The  operation  of  see  is  based  only 
on  the  use  of  information  collected  locally  at  each  vertex  in  the  picture. 
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SEE  begins  operation  when  it  is  presented  with  a  special  descrip¬ 
tion  of  a  picture.  The  description  contains  information  about  the  regions 
in  the  picture,  the  vertices  in  the  picture,  and  the  background  of  the 
picture.  For  the  simple  picture  one  shown  in  Fig.  5-10,  see  would  be 
given  the  following  information: 

G 


Figure  5-10.  ONE,  a  picture  of  a  simple  scene.  (Guzman,  1968, 
reprinted  with  permission.) 

Regions:  (1  2  3  4  5  6) 

A  list  (not  necessarily  ordered)  of  the  regions 
composing  scene  one 
Vertices:  (ABCDEFGHIJK) 

Unordered  list  of  vertices  contained  in  scene 
ONE 

Background:  (6) 

Unordered  list  of  regions  composing  the  back¬ 
ground  of  scene  one 

In  addition,  see  is  given  information  about  each  of  the  regions  and 
vertices  named  in  this  description.  For  regions,  this  information  de¬ 
scribes  the  regions  that  are  neighbors  to  each  region;  the  kvertices  of 
each  region;  and  the  foop  property  of  the  region.  For  region  2  in  pic¬ 
ture  one,  this  information  is  as  follows: 

NEIGHBORS:  (3  46  1  6) 

Counterclockwise  ordered  list  of  all  regions 
that  neighbor  region  2 
KVERTICES:  (DEACK) 

Counterclockwise  ordered  list  of  all  vertices 
that  belong  to  region  2 
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FOOP:  (3D4E6A  1  C  6  K) 

Counterclockwise  ordered  list  of  alternating 
neighbors  and  kvertices  of  region  2 

Each  of  these  properties  of  a  given  region  could  be  determined  rather 
simply  by  a  program  that  would  scan  along  the  lines  in  a  good  line 
drawing. 

For  each  vertex  in  a  picture,  see  is  given  information  that  describes 
the  X  and  y  coordinates,  or  position  of  the  vertex;  the  other  vertices  to 
which  it  is  connected;  the  regions  to  which  it  belongs;  and  the  type  of 
the  vertex  (Fig.  5-11).  Thus,  for  vertex  H  in  picture  one,  see  is  given 


POSITION: 

N  VERTICES: 

NREGIONS: 

TYPE: 


XCOR  3.0,  YCOR  15.0 

x-coordinate  and  y-coordinate  of  H 
(IGD) 

Counterclockwise  ordered  list  of  vertices  to 
which  H  is  connected 
(3  5  4) 

Counterclockwise  ordered  list  of  regions  to 
which  H  belongs 
FORK 

Type  name  of  the  vertex  (see  Fig.  5-11) 


The  type  name  of  a  vertex  is  the  name  of  one  of  eight  possible 
classes  to  which  it  may  belong,  depending  on  the  number  of  lines  and 
the  size  of  the  angles  that  form  the  vertex  (see  Fig.  5-11).  These 
classes  are  exhaustive  and  mutually  exclusive  in  that  any  vertex  must 
belong  to  one  and  only  one  of  them.  In  addition,  for  each  vertex  see 
is  given  a  counterclockwise-ordered  list  of  alternating  regions  and 
vertices  to  which  it  belongs  or  is  connected,  and  see  is  given  other 
information  about  the  size  of  the  angles  belonging  to  the  vertex,  etc. 
Again,  all  this  information  could  be  determined  by  a  program  that 
would  scan  along  the  lines  in  a  good  line  drawing,  such  as  that  for 
scene  one. 

Given  this  information,  see  proceeds  in  a  heuristic  manner  to  find 
evidence  (Fig.  5-12)  that  regions  in  the  picture  should  be  grouped 
together  and  considered  as  surfaces  of  a  three-dimensional  object. 
Initially,  see  considers  each  region  in  the  picture  to  be  within  an  indi¬ 
vidual  nucleus;  no  two  regions  share  the  same  initial  nucleus.  However, 
if  SEE  decides  that  two  regions  in  separate  nuclei  should  be  grouped  to¬ 
gether  (considered  part  of  the  same  object),  it  will  merge  their  nuclei, 
placing  all  regions  in  both  nuclei  within  the  same,  new  nucleus.  Thus, 
SEE  will  eventually  build  up  nuclei  containing  many  regions,  depending 


< 


MATCHED  T's 


PEAK 


Figure  5—11.  Types  of  vertices.  (Winston,  1970,  reprinted  with  per¬ 
mission.) 
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Figure  5-12.  Evidence  from  vertices.  (Guzman,  1968,  reprinted  with 

permission.) 


on  the  way  it  is  guided  by  the  evidence  in  the  picture.  When  it  cannot 
find  any  more  evidence  or  merge  any  more  nuclei,  it  will  stop  and 
report  each  nucleus  as  a  separate  object  in  the  scene,  consisting  of  the 
appropriate  regions. 

SEE  distinguishes  between  two  types  of  evidence,  known  as  strong 
and  weak,  and  is  capable  of  hunting  for  a  variety  of  different  clues  that 
indicate  that  two  regions  in  a  picture  should  be  grouped  together.  Once 
it  has  found  such  a  clue,  it  decides  whether  the  clue  is  strong  or  weak 
evidence,  and  it  notes  that  the  clue  was  found  by  placing  either  a  strong 
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or  weak  link  between  the  two  regions  and  the  nuclei  to  which  they 
belong.®  Its  decision  to  merge  two  nuclei  is  based  solely  on  the  number 
of  strong  and  weak  links  between  them,  not  on  the  clues  that  caused 
those  links  to  be  formed. 

Some  of  the  clues  that  see  uses  are  listed  below  (see  Fig.  5-12). 

Fork.  If  three  regions  meet  at  a  vertex  of  the  fork  type,  and 
none  is  in  the  background,  strong  links  between  them  will  be  formed 
(with  some  exceptions:  see  Guzman,  1968a,b). 

Arrow.  If  three  nonbackground  regions  meet  at  a  vertex  of  the 
ARROW  type,  a  strong  link  will  be  formed  between  the  two  regions 
that  have  the  small  (less  than  180°)  angles  of  the  vertex, 

X.  If  four  nonbackground  regions  meet  at  a  vertex  of  the  X  type, 
and  if  the  vertex  is  not  formed  by  the  intersection  of  two  straight  lines, 
then  two  strong  links  are  established,  as  in  Fig.  5-12. 

Peak.  If  several  nonbackground  regions  meet  at  a  vertex  of  the 
PEAK  type,  all  regions  except  the  one  containing  the  obtuse  angle 
(greater  than  180°)  are  given  strong  links  to  each  other. 

Ts.  SEE  attempts  to  find  vertices  of  type  T  that  match  each  other. 
Two  vertices  of  type  T  match  each  other  if  their  central  segments  are 
colinear  and  if  they  are  ‘‘facing  each  other.”  SEE  establishes  strong 
links  between  regions  of  matching  T’s,  as  in  Fig.  5-12,  providing  these 
links  do  not  cause  a  background  region  to  be  linked  to  a  nonback¬ 
ground  region. 

An  ARROW  type  vertex  is  a  leg  if  one  of  its  small  angles  leads 
(if  necessary,  through  a  chain  of  matched  T’s)  to  an  angle  which  has 
one  side  parallel  to  the  central  segment  of  the  arrow.  A  weak  link  is 
formed  between  the  two  non-background  regions  of  a  Leg  type  vertex 
that  have  the  small  angles  of  the  vertex. 

In  addition  to  these  rules,  see  makes  use  of  other  clues  in  its 
search  for  strong  and  weak  evidence.  For  a  complete  description,  the 
reader  is  encouraged  to  see  Guzman  ( 1968a, b).  However,  it  should  be 
noted  that  vertices  of  types  L,  K,  and  multi  are  not  used  by  see  to 
establish  links. 

When  SEE  has  established  as  many  strong  and  weak  links  as  pos- 


^The  operations  of  forming,  merging,  and  linking  nuclei  are  all  conducted 
by  SEE  on  a  data  structure  separate  from  that  for  the  original  picture.  In  es¬ 
sence,  SEE  builds  up  a  new  description  of  the  picture,  using  nuclei  and  links, 
and  modifies  this  description  by  reference  back  to  the  original  picture  and  de¬ 
scription  of  its  regions  and  vertices. 
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sible  between  the  regions  in  a  picture,  it  makes  use  of  three  rules  for  the 
merging  of  the  nuclei  that  contain  the  regions: 

1.  If  two  nuclei  are  connected  by  two  or  more  strong  links, 
they  are  merged  into  a  single  nucleus. 

2.  If  the  first  rule  cannot  be  applied  to  any  of  the  nuclei,  then 
if  two  nuclei  are  connected  by  a  strong  link  and  a  weak  link, 
they  are  merged;  having  made  this  merge,  go  back  to  the 
first  rule  and  see  if  it  can  be  applied. 

3.  If  neither  the  first  nor  second  rule  can  be  applied,  and  there 
is  a  nucleus  containing  a  single  region  that  is  joined  by  a 
strong  link  to  another  nucleus  (and  has  no  other  links  leaving 
it),  then  the  two  nuclei  are  merged. 

These  heuristic  rules  are  sufficient  to  enable  see  to  identify  objects 
in  many  rather  complex  scenes,  even  when  see’s  “view”  of  an  object 
may  be  partially  occluded  by  other  objects.  In  general,  when  see  makes 
mistakes,  it  errs  conservatively  by  not  grouping  together  regions  that 
humans  would  think  plausibly  belong  to  the  same  object.  Thus,  for  the 
scene  in  Fig.  5-13,  see  groups  all  the  regions  together  in  the  same 
plausible  manner  that  humans  would,  except  for  regions  41  and  42;  it 
leaves  these  regions  in  their  initial  nuclei  and  reports  them  as  belonging 
to  separate  objects. 

It  should  be  pointed  out  that,  given  a  single  picture  of  a  scene,  it 
is  impossible  to  prove  that  any  of  the  regions  in  the  picture  actually 
belong  to  the  same  object.  Each  region  in  the  scene  could  be  the  base  of 
a  pyramid  such  that  all  other  faces  of  that  pyramid  are  hidden  from 
view;  thus,  no  two  visible  regions  would  belong  to  the  same  object.  In 
effect,  any  program  that  identifies  objects  from  a  single  picture  of  a 
scene  must  be  based  on  notions  of  plausibility  for  real-world  environ¬ 
ments;  i.e.,  it  must  be  a  heuristic  program. 

Guzman  discussed  a  number  of  extensions  that  could  be  made  to 
his  program,  and  the  reader  is  encouraged  to  investigate  his  work  fur¬ 
ther.  Recently,  Huffman  (1971),  Clowes  (1971),  and  Waltz  (1972) 
have  written  programs  for  object  recognition  which  are  similar  to  Guz¬ 
man’s  but  have  a  more  algorithmic  design.  Like  see,  these  programs 
rely  on  local  information  about  the  vertices  in  a  line  drawing.  Huffman’s 
work  also  discusses  the  recognition  of  smooth,  curved,  nonpolyhedral 
objects,  and  the  recognition  of  “impossible”  objects.  Waltz  devotes 
special  attention  to  the  recognition  of  shadows  and  the  detection  of 
missing  lines  in  a  line  drawing— this  is  especially  important  because  the 
performance  of  these  programs  is  highly  dependent  on  the  quality  of 
the  line  drawings  available  to  them. 
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LEARNING  TO  RECOGNIZE  STRUCTURES 
OF  SIMPLE  OBJECTS 

The  problem  of  object  identification  by  visual  perception  systems 
would  be  intractable  if  all  objects  in  the  real  world  were  to  be  identified 
visually  using  only  such  features  as  their  texture^  color,  abstract  shape, 
and  the  angles  formed  by  their  edges.  Many  objects  in  the  real  world 
are  composed  of  other,  simpler  objects,  somewhat  independently  of 
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how  these  features  are  possessed  by  the  simpler  objects.  Thus,  we  can 
recognize  a  railroad  train  regardless  of  whether  its  cars  are  boxcars, 
flatcars,  or  passenger  cars  with  rounded  corners  and  edges,  and  regard¬ 
less  of  what  texture,  color,  or  abstract  shape  the  cars  may  be  said  to 
possess.  Our  recognition  of  the  railroad  train  depends  as  much  on  the 
“structure”  formed  by  the  objects  that  make  up  the  train  as  it  does  on 
the  objects  themselves.  This  section  presents  a  brief  description  of  a 
computer  program,  written  by  Winston  (1970),  which  is  capable  of 
learning  to  recognize  structures  of  simple  objects.  Although  computer 
visual-perception  systems  have  a  long  way  to  go  if  they  are  ever  to 
match  human  visual  performance,  it  is  likely  that  future  developments 
in  pattern-recognizing  systems  will  use  Winston’s  work  as  a  starting 
point. 

Winston’s  program  is  designed  to  use  the  type  of  description  of  a 
visual  scene  that  is  provided  by  Guzman’s  program  (see  the  preceding 
section).  The  information  in  a  Guzman  type  of  description  of  a  visual 
scene  corresponds  to  a  labeling  of  the  regions  and  of  the  vertices  formed 
by  the  lines  in  the  scene,  plus  a  labeling  of  “objects”  in  the  visual  scene 
which  appear  to  be  made  up  of  the  labeled  regions  and  vertices.  Win¬ 
ston’s  program  is  capable  of  recognizing  various  types  of  objects  and 
various  relations  between  them,  and  of  describing  the  visual  scene  as  a 


Figure  5-14.  Blocks  and  wedges.  (Winston,  1970,  reprinted  with  per¬ 
mission.) 


Pattern  perception 


203 


structure  made  up  of  certain  objects  and  relations.  The  major  types  of 
objects  and  relations  recognized  in  Winston’s  (1970)  program  are 
bricks,  wedges,  and  above,  supports,  in-front-of,  right-of,  left-of,  and 
marry Winston  showed  that  his  program  could  be  modified  to  recog¬ 
nize  other  objects  and  relations.  When  shown  the  scene  in  Fig.  5-14, 
Winston’s  program  will  recognize  the  objects  and  relations  listed  in 
Table  5-2.  The  program  will  generate  a  description  of  the  scene  that 

TABLE  5“2.  Objects  and  Relations  for  Fig.  5-14 


A  supported-by 

B  C 

in-front-of 

F  G 

B 

K 

— 

C 

D  E 

— 

D 

— 

E 

E 

— . 

— 

F 

E 

G 

— 

— 

H 

I  J 

— 

I 

T 

— 

— 

J 

K 

H 

E 

corresponds  to  the  graphlike  structure  shown  in  Fig.  5-15.  Such  a 
description  will  be  called  a  description  graph.  The  greater  part  of  Win¬ 
ston’s  program  is  concerned  with  comparing  description  graphs  of 
visual  scenes  to  each  other,  and  with  developing  general  description 
graphs  that  can  represent  of  visual  scenes.  To  do  this,  Winston  al¬ 
lows  his  description  graphs  to  contain  nodes  that  may  represent  growpj 
of  objects  and  to  contain  arcs  that  may  represent  relations  between 
groups  of  objects  and  objects.  For  example,  one  such  relation  is  one- 
par  tds,  which  holds  for  nodes  A  and  B  it  A  represents  a  group  of 
objects  and  B  represents  an  object  “in”  A.  Furthermore,  Winston  allows 
relations  themselves  to  be  described  by  description  graphs  in  which 
nodes  may  represent  relations,  and  arcs  may  represent  relations  be¬ 
tween  relations  (illustrated  below).  Winston’s  use  of  description  graphs 
is  sufficiently  general  that  not  only  objects  and  structures  of  objects 
(i.e.,  scenes),  but  also  relations,  sets  of  scenes,  relations  between  scenes, 
comparisons  of  scenes,  relations  between  description  graphs,  and  com¬ 
parisons  of  description  graphs  may  all  be  described  by  description  graphs. 

As  an  example,  we  may  define  an  arch  to  be  a  group  of  objects 
(A,  B,  and  C)  such  that  B  and  C  are  each  “a-kind-of”  brick  and  A  is 
“a-kind-of”  object;  B  and  C  are  “standing”  and  A  is  “lying”;  A  “must- 
be-supported-by”  B  and  A  “must-be-supported-by”  C;  B  and  C  “must- 

®  These  objects  and  relations  all  have  approximately  the  meanings  that  hu¬ 
mans  normally  give  them,  except  for  marry s.  Tvjo  objects  are  said  to  “marry” 
each  other  if  the  objects  have  faces  that  touch  each  other  and  have  at  least  one 
common  edge. 
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Figure  5-15.  A  description  graph  of  Figure  6-14.  See  Table  5-2. 

not-abut”  (i.e.v  have  faces  that  touch  each  other)  .  Figure  5-16  shows 
some  simple  examples  of  groups  of  objects,  some  of  which  are  arches 
and  some  of  which  are  not.  Figure  5-17  shows  part  of  a  description 
graph  for  the  set  df  scenes  that  should  be  rec6grii2:ed  as  “arches,”  ac¬ 
cording  to  the  definition.  Note  that  this  description  graph  iiicludes  nodes 
that  represent  relations  ( must-be-supported-by,  supported-by,  etc. )  and 
arcs  that  represent  relations  between  relations  (modification-of,  must- 
be-satellite,  Winston’s  computer  program  can  use  this  description 
graph  to  identify  correctly  those  groups  of  objects  shown  in  Fig.  5-17 
which  are  arches  and  those  groups  of  objects  shown  in  Fig.  5—17  which 
are  not  arches.  Moreover,  Winston’s  program  can  use  this  description 
graph  to  recognize  that  the  entire  group  of  objects  shown  in  Fig.  5-18 
is  “a-kind-bff  arch.  In  fact,  the  computer  program  will  find  and  identify 
five  g;roUps  of  objtete  in  Fig."  5-1 8  that  are  each  “a-kind-6f”  arch. 

What  is, m^  about  Winstofa’s  program  is  th6  fact  that 

it  is  a  By  this  we  mean  that  Winston’s  program  is 

capable  df  if graph  for  a  set  of  scenes  that 
it  is  told  are  exaniples  of  some  pattern'  Thus,  the  program  is  capable 
of  develojring  the  description  graph  for  “arch”  shown  in  Fig.  5-17,  if 
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Arch 


Nonarch 


Nonarch 


Figure  5-*16.  Arches  and  nonarches. 

it  is  shown  only  the  groups  of  objects  (scenes)  in  Fig.  5-16,  and  if  it 
is  told  whether  each  group  of  objects  in  Fig.  5—1 6  is  or  is  not  an  arch. 
It  can  then  use  this  description  graph  to  identify  other,  previously  un¬ 
presented  groups  of  objects  (such  as  that  in  Fig.  5-18)  as  being  arches, 
without  being  told  that  they  are  arches.  Thus,  we  can  reasonably  say 
that  the  program  “learns”  to  recognize  the  pattern  “arch.”  Similarly, 
the  program  can  learn  to  recognize  “columns,”  “houses,”  “pedestals,” 
“tents,”  “tables,”  and  “arcades”  (Fig.  5-19). 

Although  Winston’s  program  is  a  “learning”  program,  it  does  re¬ 
quire  a  “teacher”  to  tell  it  what  patterns  to  recognize  (e.g.,  “arch”  and 
“house”)  and  to  give  it  pattern  examples  (scenes)  for  each  pattern. 
Winston’s  thesis  had  a  great  deal  to  say  about  the  subject  of  “teaching.” 
In  particular,  he  emphasized  the  value  of  presenting  to  the  computer 
scenes  that  are  “near  misses.”  A  near-miss  is  a  scene  that  is  not  an 
example  of  the  pattern  being  taught  because  it  fails  to  satisfy  only  one 
condition  of  the  pattern  rule  for  the  pattern.  Each  of  the  nonarches  in 
Fig.  5-16  is  a  near-miss  to  the  pattern  “arch.” 

Because  comparisons  of  description  graphs  are  themselves  repre- 
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MUST-BE-SATELLITE 


MOOIFICATION-OF  ^ 


I 


MUST-  NOT- 
,  ABUT 


-MUST-NOT- 

SATELLITE 


SPATIAL- 

RELATION 


Figure  5-17.  A  description  graph  for  the  set  of  arches.  (Winston,  1970, 
reprinted  with  permission.) 


Figure  5-19.  Examples  of  shapes. 
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seated  as  description  graphs,  Winston’s  prograih  can  be  used  to  solve 
three-dimensional  analogy  problems  similar  to  those  solved  by  Evans’ 
ANALOGY  program  (see  Exercise  5-4).  Thus,  If  presented  with 
scenes  that  are  labeled  as  in  Fig.  5-20,  and  if  asked  to  find  a  value  for 
AT  such  that “4  is  to  B  as  G  is  to  AT”  will  be  true,  Winston’s  program 
will -chpokl'"#’ ,§':EGUR.’’ ' .  ' '  ■  ' -  r,., : - ^  : 

Wiiiston  presented  a  number  of  suggestions  for  further  work  on 
pattern  perception  systems.  One  of  the  most  desirable  extensions  he 
suggested  is  the  design  of  programs  that  can  learn  to  recognize  patterns 
with  pattern  rules  that  are  partly  “functional.”  A  pattern  is  said  to  have 
a  functional  pattern  rule  if  its  pattern  examples  are  each  required  to  be 
capable  of “pe^tforining  a  function.”  Thus,  the  pattern  “table”  has  a 
functional  pattern  rule  if  we  require  that  each  of  its  pattern  examples 
be  capable  of  supporting  a  plate  with  food,  and  silverware,  glasses,  etc. 
The  subject  of  functional  pattern  rules  is  still  an  open  problem  of  ai 
research. 

The  use  of  graphlike  structures  as  descriptions  for  pattern  examples 
and  rules  has  been  considered  by  other  researchers,  including  Shaw 
(1968),  Clark  and  Miller  (1966),  Pratt  and  Friedman  (1971),  and 
Barrow,  Ambler,  and  Burstall  (1972).  This  approach  is  related 
attempts  (viz..  Miller  and  Shaw,  1968;  Banerji,  T96^ 

1964;  Tachibana,  1972)  to  develop  linguistic  methods  for  visual  pa^ 
perception.  The  discussion  of  this  subject  is  resumed  in  Chapter  7. 


SOME  PROBLEMS  FOR  PATTERN 
PERCEPTION  SYSTEMS 


At  the  moment,  computers  are  effectively  blind.  Pattern  {especially 
visual  pattern)  perception  is  one  of  two^  major  \areas  of  investigation 
for  which  AI  research  has  not  yet  been  able  to  give  computers  a  ^‘level 
of  competence”  approaching  that  of  people.  This  is  true  despite  the 
impressive  results  of  Guzman,  Winston,  and  others.  Also,  visual  pattern 
perception  is  a  major  bottleneck  to  the  development  of  many  useful 
mechanical  intelligences:  It  is  a  necessity,  for  example,  for  machines 
that  would  work  intelligently  in  a  factory  or  could  navigate  independ¬ 
ently  on  another  planet,  This  section  summarizes  jsome  of  the  current, 
majpr..Hproblems  confronting  ai  research  on  pattern  perception  systems. 


^  The  other  area  is  “semantic  information  processing.” 
in  these  areas  appear  to  be  ^strongly  related. 
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First  and  foremost,  it  is  desirable  to  put  together  the  hierarchy  we 
have  described.  Both  Guzman’s  and  Winston’s  programs  require  perfect 
line  drawings,  which  currently  are  supplied  by  people.  There  does  not 
yet  exist  a  line  detection  program  that  can  consistently  supply  good 
line  drawings  of  real-world  scenes,  because  the  ability  to  find  a  line 
often  requires  global  information  that  local  information  about  the 
picture  should  be  ignored  or  given  special  attention.  Thus,  either  pro¬ 
gram  should  be  able  to  cause  the  eye  system  or  the  line  finder  to  search 
particular  areas  of  the  scene,  change  focusing  or  threshhold  settings  for 
those  areas,  and  perform  other  functions.  Either  program  should  be 
able  to  check  new  lines  that  are  produced  in  this  manner,  and  use  those 
lines  that  will  make  their  own  tasks  of  object  and  structure  perception 
easier. 

Besides  integrating  the  hierarchy,  a  number  of  extensions  can  be 
easily  suggested.  Programs  should  be  able  to  detect  and  make  use  of 
curved  lines,  color,  and  texture.  Programs  should  be  able  to  recognize 
structures  that  are  pattern  examples  of  patterns  with  functional  pattern 
rules.  Programs  should  be  able  to  generate  descriptions  of  motion  oc¬ 
curring  in  scenes  and  (ultimately)  make  real-time  use  of  such  descrip¬ 
tions.  Programs  should  be  capable  of  detecting  optical  illusions,  and 
compensating  for  them.  Programs  should  be  able  to  accept,  and  describe 
in  visual  terms,  information  provided  by  other  perceptual  systems  (e.g., 
auditory  or  tactile  information). 

Although  current  work  is  being  done  on  these  matters  (viz.,  Bajcsy, 
1972;  Shirai,  1972)  it  is  likely  that  computers  will  not  approach  human 
visual  competence  for  some  time,  depending  upon  the  rate  at  which  the 
processes  of  visual  perception  can  be  understood,  implemented  and 
tested  in  high-level  programming  languages  (such  as  lisp,  planner, 
CONNIVER,  and  Qa4)  and,  ultimately,  implemented  in  hardware.  Even 
so,  substantial  progress  has  been  made  in  the  study  of  pattern  percep¬ 
tion,  if  only  because  the  statement  of  these  goals  is  more  meaningful 
now  than  it  would  have  been  ten  years  ago. 


NOTES 

5-1.  At  the  root  of  most  pattern  perception  models  is  the  ‘‘Pandemonium” 
paradigm  devised  by  Selfridge  (1958).  The  Pandemonium  machine  is  com¬ 
posed  of  decision  makers,  or  demons  (physicists  may  recall  Maxwell’s 
demon,  an  imaginary  being  capable  of  acting  intelligently  on  a  microscopic 
level  and  thus  controverting  the  law  of  entropy)  arranged  in  a  latticelike 
structure  such  as  shown  here.  At  the  bottom  of  this  lattice  is  the  real  world. 
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/V  ^ 

^  '  Nk 

^  A  k  A 

/  \  /  \  Vs  /\ 

/  s  /  \  >s  \  /v  \ 

^  ^  ^  "b 


Each  of  the  demons  immediately  above  the  real  world  scans  it  and  makes  a 
decision  concerning  the  existence  of  some  feature  (i.e.,  the  extent  to  which 
the  real  world  satisfies  the  pattern  rule  for  a  pattern) ;  the  demons  at  higher 
levels  scan  their  predecessors  and  make  decisions  concerning  them.  The  top¬ 
most  demon  makes  the  final  decision  as  to  whether  a  pattern,  example  is 
present.  (Essentially  this  much  had  been  extensively  developed  by  von  Neu¬ 
mann,  1951,  in  a  general  model  for  experiments  or  observations  on  physical 
systems,  especially  quantum  mechanical  ones.)  In  addition,  Self  ridge  sug¬ 
gested  the  use  of  feedback  to  alter  the  nature  of  the  lattice,  and  proposed  an 
evolutionary  scheme  for  “demon  selection.”  (See  Chapter  8.)  In  their  de¬ 
scription  of  hierarchical  synthesis,  Barrow,  Ambler,  and  Butstall  (1972) 
provide  an  elegant  and  efficient  extension  of  this  idea. 


EXERCISES 


5  1.  Design  a  computer  program  that,  given  the  line  drawing  of  the  Maze  of 
Dedalus  (Exercise  3—1),  can  find  a  path  out. 


5-2.  What  subproblems  might  a  computer  need  to  solve  in  order  to  put  together 
jigsaw  puzzles? 
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5-3.  Write  a  paper  di^cussifig  the  interrelationships  between  the  problems  of 
pattern  recognition,  pahern  matching,  pattern  classification,  and  pattern  descrip¬ 
tion.  L- •  ''V, 

5^,  What  subprobienis  are  involved  in  solving  the  following  analogy  problem? 
Find  AT  siich  that  :C:A:. 


5-5.  Investigate  ways  of  describing  and  generating  potentially  infinite  structures 
such  as  these: 


X  OX  OX  c 

b 


c 

(Watanabe,  1971,  reprinted  with  permission.) 

5-6.  Describe  how  a  computer  might  be  programmed  to  recognize  human  faces. 

5-7.  What  ^e  the  yisuai  subproblems  to  be  solved  by  a  ^computer  program  for 
tying  and  untying  knots? 


E  ^uUogi'sm  toorfeeli  out. 

®f)at  Btotg  of  gours,  aliout  gout  once  meetinfl  tf)e 
stasserpent,  altoaga  seta  me  off  gatontna ; 

3E  nebet  gaton,  unless  lDl)tn  I’m  Itstening  to  some* 
totallg  iieboit)  of  intetest. 


^temisses,  sepatatelg. 


Cte  ^temtsses,  (omtineb. 


'Srte  (Conclusion. 


Sijat  storg  of  gouts,  about  gout  once  meeting  tfie 
seassetpent,  is  totaUg  bebofti  of  interest. 
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THEOREM  PROVING 


INTRODUCTION 

The  ability  to  prove  theorems  in  mathematics  is  a  good  example 
of  an  intellectual  faculty  and  one  that  is  relevant  to  the  construction  of 
reasoning  programs.  This  chapter  is  an  introduction  to  the  study  of 
computer  programs  that  are  capable  of  finding  proofs  for  theorems 
within  mathematical  theories.  Such  programs  are  called  theorem  prov~ 
ers.  The  first  part  of  this  chapter  introduces  the  reader  to  the  predicate 
calculus,  which  is  essentially  a  mathematical  framework  for  the  state¬ 
ment  of  mathematical  theories.  Later  sections  discuss  the  binary  reso- 
lution  procedure,  a  relatively  simple  procedure  that  has  been  the  basis 
for  many  theorem  provers.  Alternate  means  of  theorem  proving  are  also 
discussed.  This  chapter  concludes  by  showing  how  theorem  provers  can 
be  used  as  problem  solvers  for  problems  stated  in  the  state-space  para¬ 
digm,  how  they  can  be  used  to  construct  other  computer  programs  and 
prove  the  correctness  of  them,  and  how  analogies  can  be  used  to  im¬ 
prove  the  effectiveness  of  theorem  provers.  The  use  of  theorem  provers 
is  one  way  to  solve  the  “inference  problem”  in  laneuage-understanding 
programs. 

FIRST-ORDER  PREDICATE  CALCULUS 

The  invention  of  predicate  calculus  was  one  of  the  major  advances 
in  the  nineteenth-century  development  of  mathematics.  Although  Chap¬ 
ter  2  stressed  the  fact  that  any  mathematical  description  is  essentially  a 
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finite  description,  this  does  not  imply  that  all  mathematical  theories  can 
f  be  described  within  the  same  finite  framework,  that  there  is  a  mathe- 
?  matical  theory  of  mathematical  theories,  or  meta-mathematics. 

I  Predicate  calculus  is  part  of  the  notation  for  current  attempts  to 

develop  a  theory  of  metamathematics  (note  6—1).  Predicate  calculus  is 
a  language  for  the  expression  of  mathematical  theories.  When  a  mathe¬ 
matical  theory  is  expressed  in  this  language,  it  becomes  a  set  of  state¬ 
ments  (or  sentences,  or  formulas) ,  each  of  which  says  something  about 
i  the  thing  described  by  the  theory.^  Predicate  calculus  provides  a  set  of 
1  inference  rules  (for  deriving  new  statements  from  the  ones  that  are 
i  given)  and  a  set  of  symbols  (to  be  used  in  making  statements)  that 
\  seem  to  be  adequate  for  most  mathematical  theories.  Thus,  to  insure 
I  generality,  almost  all  ai  work  on  theorem  provers  has  been  concerned 
1  with  developing  machines  that  handle  sets  of  statements  in  predicate 
I  calculus  (note  6-2). 

In  fact,  almost  all  work  in  the  subject  of  theorem  proving  has  con¬ 
cerned  itself  with  theorems  stated  in  first-order  predicate  calculus,  which 
is  discussed  in  this  section.  Ultimately,  it  is  desirable  to  extend  theorem¬ 
proving  methods  to  higher-order  logics,  because  they  are  more  natural 
for  the  statement  of  most  mathematical  theories.  (The  difference  be¬ 
tween  first-  and  higher-order  logics  is  defined  below.)  Work  in  this 
direction  has  been  undertaken  (e.g.,  Robinson,  1969;  Hewitt,  1968  et 
seq.;  Pietrzykowski  and  Jensen,  1972).  The  first-order  predicate  calcu¬ 
lus  is  general  enough,  though,  so  that  if  Church  s  thesis  is  correct,  then 
all  mathematical  theories  can  be  expressed  using  it.  In  principle,  the  ai 
research  that  has  been  done  in  first-order  predicate  calculus  is  no  less 
general  than  any  work  th^  may  be  done  in  higher-order  predicate 
calculus.  However,  it  is  stressed  again  that,  in  practice,  first-order  predi¬ 
cate  calculus  is  not  adequate  for  the  statement  of  mathematical  theories 
about  most  real-world  environments  and  problems.  The  first-order 
expressions  of  such  theories  would  be  extremely  long,  complicated,  and 
inefficient  (if  they  were  at  all  obtainable),  just  as  it  would  be  extremely 
complicated  and  inefficient  to  try  to  describe  a  real-world,  problem¬ 
solving  procedure  (e.g.,  sin,  dendral)  as  a  Turing  machine.  The  ai 
research  on  first-order  predicate  calculus  has  been  valuable  as  a  rela¬ 
tively  simple  demonstration  that  computers  can  be  made  to  “reason” 
in  a  general  way  about  logical  problems.  (This  discussion  is  continued 
in  the  section  of  this  chapter  entitled  “Applications  to  Real-World 
Problems.”) 

1  Some  mathematical  theories  are  not  descriptions  of  “rear’  things.  For  exam¬ 
ple,  group  theory  is  a  description  of  a  class  of  mathematical  theories  that  are 
often  used  to  describe  many  different  things. 
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Briefly,  then,  predicate  calculus  provides  a  framework  for  making 
and  deriving  statements  that  belong  to  mathematical  theories.  Our  con¬ 
ception  is  that  statements  express  “logical  thoughts”  about  things,  that 
statements  should  be  made  up  of  symbols,  and  that  it  should  sometimes 
be  possible  to  prove  or  disprove  the  truth  of  a  statement  with  respect 
to  a  set  of  statements  that  are  known  to  be  true. 

The  symbols  of  first-order  predicate  calculus  are: 

1.  The  variables  x,  y,  z, . .  . ,  which  will  be  called  individual 
symbols.  A  variable  is  a  symbol  that  may  represent  any  ob¬ 
ject  about  which  we  can  make  a  logical  statement.  The  set 
of  things  that  a  variable  may  represent  in  a  mathematical 
theory  is  known  as  the  universe,  or  domain  of  discourse,  of 
that  theory. 

2.  For  each  n  ^  0,  the  n-ary  function  symbols  f,g,h,,,.,  and 
the  n-ary  predicate  symbols  P,Q,R, ....  For  any  given  n,  the 
number  of  such  symbols  may  be  zero  or  nonzero,  finite  or 
infinite.^ 

3.  The  logic  symbols  V,  3,  /\^  \/  which  stand  for  “for  all,” 

“there  exists,”  “not,”  “and,”  and  “or,”  respectively.^ 

4.  The  punctuation  symbols  and  “(”,  and  “)”. 

To  define  statements,  or  formulas,  we  must  also  define  terms  and 
atomic  formulas.  We  define  terms  as  follows: 

A  variable  is  a  term. 

If  r  is  a  sequence  of  n  terms  {n  greater  than  or  equal  to  0) 
and  /  is  an  n-ary  function  symbol,  then  fT  is  a  term. 

No  other  expressions  are  terms. 

We  define  atomic  formulas  as  follows: 

If  T  is  a  sequence  of  n  terms  (n  ^  0)  and  P  is  n-ary  predicate 
symbol,  then  PT  is  an  atomic  formula. 

No  other  expressions  are  atomic  formulas. 

Finally,  we. define  formulas  as  follows: 

An  atomic  formula  is  a  formula. 

If  is  a  formula,  then  so  is 

2  A  complete  formalization  of  first-order  predicate  calculus  provides  an  in¬ 
finite  number  of  variable,  function,  and  predicate  symbols.  These  are  generally 
written  Xi,  /?,  pj,  respectively,  with  the  subscripts  i,j,k  being  allowed  to  take 
numerical  values.  However,  the  examples  we  present  require  only  a  few  symbols. 

3  We  actually  provide  a  formalization  only  for  first-order  predicate  calculus 
without  equality;  predicate  calculus  with  equality  contains  an  extra  logic  sym¬ 
bol  which  stands  for  “equals.”  Theorem  provers  that  work  in  theories  with 
equality  have  encountered  difficulties;  for  a  discussion,  see  Robinson  (1970). 
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If  are  formulas,  then  A  (C/i, .  .  .  is  a  formula. 

If  C7i, . .  .,Un  are  formulas,  then  V  (t/i,  • .  *  is  a  formula. 
If  U  is  a  formula,  then,  for  any  variable  x,  VxC/  is  a  formula. 

If  U  is  a  formula,  then,  for  any  variable  3^xU  is  a  formula. 

These  definitions  make  possible  some  strings  of  symbols  as  state¬ 
ments  in  the  first-order  predicate  calculus  and  rule  out  others.  Thus, 

Vy ( A  (P{x,y),Qiz') ) ) ) 

is  a  formula,  but 

)xQR(^)^z 

is  not. 

The  first-order  predicate  calculus  expression  of  a  mathematical 
theory  consists  of  a  set  5  of  sentences,  each  of  which  is  a  formula  ac¬ 
cording  to  the  rules  given  above.  Such  a  set  S  is  called  system.  It  is 
possible  for  a  system  to  correspond  to  many  different  mathematical 
theories:  A  system  can  be  taken  as  a  description  for  many  different 
things,  depending  on  how  one  “interprets”  its  formulas. 

For  example,  in  predicate  calculus  the  most  basic  sort  of  statement 
one  can  make  is  an  atomic  formula:  R(x),  P(x,y,z),  and  G(f(x,y)  are 
all  examples  of  atomic  formulas.  Each  of  these  could  “mean”  anything, 
depending  on  how  one  interprets  the  symbols  involved.  Thus,  a  con¬ 
venient  interpretation  of  P{x,y,z)  might  be  “the  number  x  plus  the 
number  y  is  the  number  z”;  or,  P(x,y,z)  might  mean  “x  and  y  are  the 
parents  of  z.”  An  interpretation  of  a  set  of  atomic  formulas  is  given 
when  we  specify  interpretations  for  the  variable,  function,  and  predicate 
symbols  used  in  that  set  of  formulas. 

An  interpretation  for  a  set  of  variable-symbols  is  given  by  specify¬ 
ing  the  universe  of  discourse;  that  is,  the  set  of  values  they  can  assume. 
The  universe  of  discourse  for  a  set  of  variable  symbols  is  denoted  by  the 
letter  D.  For  example,  D  might  be  the  set  of  numbers  {--l,0,-{-l}.  If 
D  denotes  a  set,  then  D""  denotes  the  set  of  all  n-tuples  of  D.  Thus, 

(  +  1,-1),(0,0),(0,+  1),(  +  1,0),(  +  1,+  1)} 

(If  D  contains  m  elements,  then  D"  contains  w”  elements.)  An  inter¬ 
pretation  of  an  n-ary  predicate  symbol  P  associates  each  element  of  D” 
with  exactly  one  element  of  the  set  {true,  false}.  Thus,  if  an  interpreta¬ 
tion  of  P  gives  (  —  1,0)  the  truth-value  “false,”  we  say  “P(  — 1,0)  is 
false.”  An  interpretation  of  an  n-ary  function  f  associates  each  element 
of  with  exactly  one  element  of  D.  Thus,  if  an  interpretation  of  / 
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gives  to  (  +  1,+1)  the  value  +1,  we  say  “/(  +  1,+1)  is  +1,”  For  n-0 
we  define  D"  to  be  the  set  containing  (),  the  zero-tuple.  By  our  defini¬ 
tion,  the  interpretation  of  a  zero-ary  function  must  be  a  constant; 
rather  than  write  expressions  like  “/(  ),”  we  can  usually  denote  con¬ 
stants  by  the  letters  a,b,c.* 

From  our  definition  of  a  formula  it  is  clear  that  there  are  many 
types  of  formulas  more  complicated  than  atomic  formulas,  and  that 
these  sentences  can  be  constructed  using  the  logical  symbols;  that  is,  the 
operators  V ,  A ,  and  the  quantifiers  3, V.  Explaining  the  meaning  of 
such  a  formula  is  rather  straightforward;  The  operator  “1  produces  the 
negation  of  the  statement  it  is  applied  to:  thus,  if  P(-1,0)  is  false,  then 
^P(-1,0)  is  true,  and  vice  versa.  If  the  operator  V  is  applied  to  a 
sequence  of  statements,  it  produces  their  disjunction;  that  is,  it  produces 
the  statement  that  is  true  if  and  only  if  at  least  one  of  the  statements  in 
the  sequence  is  true.  Applying  the  operator  A  to  a  sequence  of  state¬ 
ments  produces  their  conjunction,  that  statement  which  is  true  if  and 
only  if  all  the  statements  in  the  sequence  are  true.  Thus,  if  P(  — 1,0)  is 
false,  P(l,l)  is  true,  and  P(0,-1)  is  true,  then 

V(P(-l,0),P(t,l))  is  true. 

A(P(-1,0),P(1,1))  is  false. 

V(P(1,1),P(0,-1))  istrue. 

A(P(1,1),P(0,— 1))  istrue. 

V(P(-1,0),P(1,1),P(0,-1))  istrue. 
A(P(-1,0),P(1,1),P(0,-1))  is  false. 

It  is  common  to  introduce  a  fourth  logic  symbol  to  be  read  “im¬ 
plies,”  and  to  rewrite  any  formula  of  the  form  \/OU,V)  in  the  form 
U-^V.  Of  course  such  a  formula  may  be  true  or  false,  regardless  of  the 
form  in  which  one  writes  it.  The  truth  or  falsity  of  V-^V  depends  only 
on  the  truth  or  falsity  of  U  and  V.  Thus,  according  to  our  example, 

P(1,1)^P(0,  — 1)  istrue. 

P(1,1)->P(— 1,0)  is  false. 

P(  — 1,0)— >P(1,1)  is  true. 

In  fact,  if  U  is  false,  then  U^V  is  true,  regardless  of  the  value  of  V. 

If  the  existential  operator  3  is  used  to  quantify  a  variable  in  a 
formula,  it  produces  the  statement  that  there  is  some  value  of  the 
variable  in  the  universe  of  discourse  for  which  the  formula  is  true.  Thus 
3xP(-l,x)  means  “there  is  a  value  of  x  such  that  P(-l,x)  is  true.” 

^Similarly,  a  zero-ary  predicate  is  the  same  thing  as  a  proposition.  Proposi¬ 
tional  (zero-order)  predicate  calculus  is  not  considered  in  this  book.  See  Suppes 
(1957)  for  a  good  introduction. 
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When  the  universal  operator  V  is  applied  to  a  variable  in  a  formula, 
it  produces  the  statement  that  “for  all  values  of  the  variable  in  the 
universe  of  discourse  the  formula  is  true.”  Vj:P(  — l,.x)  would  mean  for 
all  values  of  a:,  Pi-l,x)  is  true.”  To  find  the  truth  value  of  a  very 
complex  formula,  with  many  logical  operators  and  quantifiers,  we  start 
with  its  simplest  components  and  work  outward.  For  example,  suppose 
axP(-l,x)  is  true,  VxP(-l,x)  is  false,  and  P(l,l)  is  true:  then 


A(  V(  3xP(-l,x),  VxP(-l,x)),  P(l,l)) 

true  false  true 


true 

_ _ _ y 

' - — - - Y‘ 

true 

is  true,  as  the  preceding  diagram  of  its  evaluation  shows.  Variables  that 
are  quantified  in  a  formula  are  said  to  be  bound,  while  those  which  are 
not  quantified  are  said  to  be  free.  First-order  predicate  calculus  does 
not  permit  predicate  or  function  symbols  to  be  quantified  or  to  be  used 
within  predicate  arguments  in  formulas;  both  of  these  things  are  natural 
for  human  mathematicians  and  may  happen  in  higher-order  predicate 
calculus.  Henceforth,  the  expression  “predicate  calculus”  will  be  used 
to  refer  only  to  first-order  predicate  calculus,  unless  otherwise  specified. 
However,  the  remarks  in  the  remainder  of  this  section  are  valid  for 
predicate  calculus  in  general. 

The  interpretation  of  logic  symbols  is  standard  throughout  predi¬ 
cate  calculus,  so  the  interpretation  of  the  formulas  of  a  system  really 
depends  only  on  the  interpretation  of  the  atomic  formulas  of  that 
system.  If  the  domain  of  discourse  of  the  variable  symbols  in  a  system 
has  been  specified,  and  interpretations  for  all  functions  and  predicates 
involved  in  the  system  have  been  given,  then  we  have  an  interpretation 
for  the  system  itself.  If  each  formula  in  a  system  turns  out  to  have  the 
value  “true”  with  respect  to  an  interpretation,  then  the  interpretation 
is  said  to  be  a  model  for  the  system.  A  system  may  have  zero,  one,  or 
many  models.  In  the  first  case  it  is  said  to  be  unsatisfiable;  in  the  other 
cases  it  is  said  to  be  satisfiable. 

So  far  nothing  has  been  said  about  rules  of  inference,  that  is,  ways 
of  deriving  one  formula  from  other  formulas.  Suppose  5  is  a  system  (set 
of  formulas)  and  U  is  a  formula.  Then  we  say  that  S  logically  implies  U 
if  and  only  if  U  has  the  value  “true”  with  respect  to  every  model  for  S. 
Trivially,  every  formula  in  S  is  logically  implied  by  5.  A  rule  of  inference 
is  a  procedure  that,  given  a  set  S  of  formulas,  may  produce  only 
formulas  that  are  logically  implied  by  S.  Different  formalizations  of 
predicate  calculus  make  use  of  different  inference  rules.  Five  of  these 
are: 
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EXPANSION  RULE.  If  U  and  V  are  formulas  and  U  is  logically 
implied  by  S,  then  V(U,V)  is  logically  implied  by  5. 
CONTRACTION  RULE.  If  V  (U,U)  is  logically  implied  by  5,  then 
U  is  logically  implied  by  S, 

ASSOCIATIVE  RULE.  If  V  (t/,  V  (V,W) )  is  logically  implied  by  S, 
then  V  (V  (U,V)  ,W)  and  V  {U,V,W)  are  logically  implied  by 
S,  and  vice  versa. 

CUT  RULE.  If  \/(UjV)  and  U-^W  are  logically  implied  by  S, 
then  \/  (V,W)  is  logically  implied  by  S. 

3-introduction  rule.  If  is  not  free  in  V  and  C/->F  is 
logically  implied  by  5,  then  3x  is  logically  implied  by  5. 

The  next  section  presents  a  special  inference  rule,  the  resolution 
procedure,  which  can  be  used  in  place  of  all  of  the  preceding  five  rules. 
If  a  logical  implication  of  5  can  be  derived  using  them,  then  it  can  be 
derived  using  the  resolution  principle. 

In  general,  any  formula  that  is  logically  implied  by  a  system  S  is 
referred  to  as  being  a  theorem  ot  5.  Many  systems  will  contain  and 
logically  imply  an  infinite  number  of  formulas,  and  will  be  called 
infinite  systems.  An  attempt  can  be  made  to  describe  an  infinite  system 
S  by  presenting  some  finite  set  Sa  of  formulas  and  comparing  the  set 
Imp(5'a),  of  all  formulas  that  are  logically  implied  by  5^,  with  the  set 
Imp(.S).  If  Imp(5'a)  =  Imp(»S)  ,  then  we  say  Sa  is  an  axiomatizcition  for 
S,  and  we  call  the  formulas  in  Sa  axioms  for  S.  Those  formulas  that  are 
theorems  of  Sa,  but  not  axioms  of  Sa,  are  called  consequences  of  S,  with 
respect  to  the  axiomatization  of  Sa, 

Two  things  remain  to  be  pointed  out:  First,  using  a  given  inference 
rule  (or  set  of  inference  rules)  will  not  necessarily  enable  one  to 
produce  all  the  theorenis  of  a  given  system  S.  An  inference  rule  is  an 
“if . . .  then  ...”  statement  that  enables  one  to  establish  the  logical 
implication  of  some  formulas,  given  the  logical  implication  of  other 
formulas.  Given  an  initial  set  Sa,  one  can  establish  formulas  not  in  5a 
as  being  logically  implied  by  Sa.  Given  these  formulas  plus  5a,  one  can 
establish  more  formulas  as  being  logically  implied  by  Sa,  etc.  However, 
one  cannot  .necessarily  establish  every  formula  that  is  logically  implied 
by  Sa,  using  a  given  inference  rule.  In  fact,  for  some  systems,'  one  can 
show  that  there  is  no  set  of  inference  rules  that  will  enable  one  to 
establish  in  a  finite  number  of  steps  each  formula  that  is  logically  im¬ 
plied  by  the  system.  Such  a  system  is  said  to  be  undecidahle. ^ 

®  For  example,  number  theory  is  undecidable.  The  proof  of  the  existence  of 
undecidable  systems  is  Godel’s  famous  result  (Godel,  1931).  The  existence  of 
undecidable  systems  is  equivalent  to  the  unsolvability  of  the  Halting  Problem 
(see  the  section  entitled  “Limits  to  Computational  Ability”  in  Chapter  2). 
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intelligence  research  cannot  produce  a  consistent  theorem  prover  (a  set 
of  inference  rules)  that  is  capable  of  proving  (establishing  the  logical 
implication  of)  every  theorem  of  an  undecidable  system. 

Second,  if  a  formula  is  not  logically  implied  by  a  given  system, 
it  may  still  be  true  for  some  models  of  the  system.  Formulas  that  are 
true  for  some  models  of  a  system  and  false  for  others  are  said  to  be 
contingent  (Kleene,  1967,  p.  29). 


THEOREM-PROVING  TECHNIQUES 
Resolution 

Groundwork 

If  [/  is  a  formula  and  S  is  a  set  of  formulas,  and  S  logically  im¬ 
plies  U,  then  U  has  the  value  “true”  with  respect  to  every  model  for  S. 
Thus,  11/  has  the  value  “false”  with  respect  to  every  model  for  5.  Let 
us  consider  the  set  5',  which  contains  all  the  formulas  in  S  and  also 
contains  the  formula  1[/.  Does  S'  have  a  model? 

The  answer  is  no,  for  the  following  reason;  If  an  interpretation  is 
a  model  for  5',  then  every  formula  in  S'  must  have  the  value  “true” 
with  respect  to  that  interpretation.  Thus,  the  interpretation  must  be  a 
model  for  any  subset  of  S'  (for  any  collection  of  formulas  that  belong 
to  S').  In  particular,  the  interpretation  must  be  a  model  for  S.  How¬ 
ever,  the  formula  1I7  must  have  the  value  “false”  with  respect  to  any 
model  for  S.  Thus,  the  formula  ~^U  must  have  the  value  “false”  with 
respect  to  any  model  for  S'.  However,  117  is  one  of  the  formulas  that 
belongs  to  S',  and  so  by  definition  it  must  have  the  value  “true”  with 
respect  to  any  model  for  S'.  If  S'  had  a  model,  1{7  would  therefore  have 
both  the  value  “true”  and  the  value  “false.”  In  predicate  calculus  an 
interpretation  can  Specify  at  most  one  value  for  any  given  formula. 
Consequently,  S'  does  not  have  a  model.  Thus,  by  definition.  S'  is  said 
to  be  unsatisfiable. 

Similarly,  we  can  show  that  if  S'  is  unsatisfiable,  and  yet  S  is 
satisfiable,  then  5  logically  implies  U.  Thus,  if  we  want  to  show  that  a 
satisfiahle  set  of  formulas  S  logically  implies  a  formula  U,  it  is  sufficient 
to  show  that  the  S'  set  ( =S U  {117}  )  is  unsatisfiable. 

This  is  a  technique  that  is  often  used  by  the  theorem  provers 
developed  in  ai  research.  The  theorem  prover  is  given  a  set  of 
formulas,  which  is  called  its  data  base.  It  is  also  given  a  formula  U, 
called  the  conjecture.  The  problem  for  the  theorem  prover  is  to  prove 


Theorem  proving 


223 


that  U  follows  from  Sa',  that  is,  that  V  is  logically  implied  by  Sa.  The 
procedure  followed  by  the  theorem  prover  is  to  construct  the  formula 
~^U,  called  the  negated  conjecture,  and  to  attempt  to  show  that  the  set 
S'a,  which  contains  the  formulas  in  S,  and  the  negated  conjecture,  is 
unsatisfiable.  One  way  in  which  many  theorem  provers  attempt  to  show 
the  unsatisfiability  of  a  set  of  formulas  is  through  the  use  of  the 
resolution  procedure.  This  procedure  was  originally  developed  by  J.  A. 
Robinson  (1965  et  seq.)-  Extensions  and  other  theorem-proving  tech¬ 
niques  have  been  developed  by  Wos,  P.  B.  Andrews,  G.  A.  Robinson, 
Slagle,  Sibert,  Luckham,  Nilsson,  Prawitz,  Loveland,  Hayes,  Kowalski, 
Meltzer,  Darlington,  Guard,  Gilmore,  Gelernter,  Reiter,  Pietrzykowski, 
Coles,  Green,  Kling,  Hewitt,  and  others  (see  the  Bibliography).  Some  of 
the  early  work  which  led  to  the  development  of  these  techniques  was 
done  by  Davis,  Quine,  Dreben,  Newell  et  al.,  and  Wang.  This  section 
describes  the  steps  involved  in  the  application  of  the  binary  resolution 
principle.  A  more  detailed  presentation  is  given  in  Nilsson  (1971). 

Clause-Form  Equivalents 

The  first  step  in  the  application  of  the  resolution  principle  to  a  set 
of  formulas  S'a  is  to  reptaee  each  formula  in  S'a  by  an  expression  known 
as  its  clause-form  equivalent.  Every  formula  in  first-order  predicate 
calculus  has  a  clause-form  equivalent,  which  may  be  obtained  by  ap- 
plying  the  following  sequence  of  operations:  Yixst,  eliminate  implication 
signs.  Wherever  an  expression  of  the  form  occurs  in  a  formula, 

we  replace  it  by  V  For  example,  if  we  are  finding  the  clause- 

form  equivalent  of  the  formula 

VxVy  ((A(x)-^'X:(x,y))^~f^xAzA(Pix,z),R(z)) 
then  this  first  step  produces  the  formula 

(VxVy)  VC^V (~^A(x),  ~iC(x,y)),  ~iVx3.zA(P(x,z),  R(z))) 

Our  next  step  is  to  reduce  the  scope  of  all  negation  signs,  making  each 
negation  sign  apply  to  at  most  one  predicate,  using  these  substitutions: 


Replace 

...,B) 

by  . .  .  r^B) 

Replace 

1A  {A,...  ,B) 

by  VP, . . .  r^B) 

Replace 

-n^ 

by  A 

Replace 

1(Vx/4) 

by  3-x(  ~^A) 

Replace 

~\i3.xA) 

by  ^xC^A) 

The  application 

of  this  step  to  our 

example  yields 

VxVy  V  ( A  (^  (x),C(x,y)  ),3xVzV  (“'P(x,z),“IR(z) ) ) 


224 


INTRODUCTION  TO  ARTIFICIAL  INTELLIGENCE 


Our  third  step  is  to  standardize  variables;  that  is,  rename  the  variables 
in  our  formula  so  that  each  quantifier  binds  a  unique  variable  symbol. 
Within  the  scope  of  a  given  quantifier  the  variable  that  is  bound  by  that 
quantifier  is  really  a  dummy  variable,  and  it  doesn’t  matter  what  letter 
we  use  to  represent  it.  If  we  standardize  the  variables  in  our  example, 
we  obtain 

VjcVy  V  ( A  (A  (x),C(x,y)  ),3mVz  V  OP(u,z)r^R{z) ) ) 

Next,  we  eliminate  the  existential  quantifiers  from  our  formula.  To  see 
how  this  may  be  done,  consider  the  expression 

VjcVy3zP(x,y,z) 

It. is  clear  that  the  value  of  z  which  will  satisfy  P(x,y,z)  may  depend  on 
the  values  of  x  and  y.  We  can  indicate  this  possible  dependence  by  an 
undefined  function,  known  as  a  Skolem  function,  and  writing  our  ex¬ 
pression 

VxVyP(x,};, /(x,y)) 

We  may  interpret  the  Skolem  function  f{x,y)  as  specifying  for  any 
given  values  of  x  and  y  a  value  for  z  that  “exists”  and  is  such  that 
P{x,y,z)  .  In  general,  we  obtain  the  Skolem  transform  of  a  formula  by 
replacing  each  existentially  quantified  variable  by  a  Skolem  function  of 
those  universally  quantified  variables  that  are  bound  by  universal 
quantifiers  whose  scopes  include  the  existential  quantifier  being  elimi¬ 
nated.  The  function  letter  used  to  replace  a  given  existentially  quantified 
variable  must  be  different  from  those  function  letters  (for  either  ordinary 
or  Skolem  functions)  that  already  occur  in  the  formula.  Eliminating 
existential  quantifiers,  our  original  example  now  becomes 

VxVyV(A(^(^),  C(x,y)),  xVCP(8(x,y),z),  ^R{z))) 

where  g(x,y)  is  the  Skolem  function  introduced.  Since  all  the  variables 
that  occur  in  the  formula  are  unique,  we  may  move  the  universal 
quantifiers  to  the  leftmost  part  of  the  formula.  This  action  is  known  as 
converting  the  formula  to  prenex  form.  The  formula  now  consists  of  a 
quantifier  string  (or  prefix)  followed  by  a  matrix.  Our  example  becomes 

VxVy  Vz V  ( A  {A  (x) ,  C(x,y) ) ,  V  OP(8(x,y),  z),  '^R(z) ) ) 

Our  next  step  is  to  put  the  matrix  in  conjunctive  normal  form.  Any 
matrix  can  be  written  as  the  conjunction  of  a  finite  set  of  disjunctions, 
atomic  formulas,  and  negatives  of  atomic  formulas.  This  may  be  done 
by  repeated  application  of  the  rule 

Replace  V(A,  ,C))  by  A(V(A,B), . . . ,  V(A,C)) 
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Thus,  our  example  becomes 

Vx'^y'izAiViAix),  ~^P(g(x,y),z),  ~^R(z)),V  {C(x,y),  ">P(g 

(^,y),z),-li?(z))) 

Finally,  since  all  the  variables  in  our  formula  are  now  universally 
quantified,  we  may  eliminate  the  universal  quantifiers,  and  simply  write 
our  example  formula  as 

A(VUU),  ^Pigix,y),  z),  “li?(z)),V(C(;c,y),  -^P{g{x,y), 

z),^Riz))) 

These  remarks  indicate  that  we  can  make  the  following  definitions: 
A  literal  is  either  an  atom  (atomic  formula)  or  the  negation  of  an  atom; 
a  clause  is  a  disjunction  of  literals;  a  formula  is  a  conjunction  of  clauses. 
Disjunctions  and  conjunctions  can  be  identified  simply  by  their  sets  of 
disjuncts  and  conjuncts,  and  we  can  speak  of  a  literal  L  as  being  an 
element  of  a  clause  C.  The  tzw//  disjunct  nil,  which  is  the  disjunction  of 
the  set  containing  no  literals,  always  has  the  truth-value  “false.” 

Thus,  a  formula  can  be  expressed  as  a  set  of  clauses.  The  “clause- 
form  equivalent”  of  our  example  is 

{{A{x),  '^P{g{x,y),  z),  {C(x,y),  'iP(g(x,y),  z), 

^R(z)}} 

The  clause-form  equivalent  of  a  set  of  formulas  is  the  union  of  the  sets 
of  clauses  representing  each  formula  (provided  the  variables  used  in 
each  formula  are  made  distinct  from  those  used  in  the  other  formulas). 
As  a  final  example,  a  clause-form  equivalent  for  the  set  of  formulas 

S  {VxVyP(x)->A(y),  Vx3zA(2(^.'^)j‘^^U))» 

\fy:ixRix,f(y,a))} 
is  the  set  of  clauses 

{OP(x),N(y)},  {Q(u.giu))},  {1P(g(M))}, 
{R(h(w),f(w,a))}} 

(The  Skolem  functions  are  g(u)  and  h(w).) 

^  Given  a  set  of  formulas  Sa  and  a  formula  U,  the  theorem  provers 
we  describe  will  attempt  to  show  that  Sa  logically  implies  U  by  forming 
the  set  S' at  which  contains  the  formulas  in  Sa  and  the  negated-conjecture 
~^U,  and  then  attempting  to  show  that  S' a  is  unsatisfiable.  The  first  step 
in  showing  the  unsatisfiability  of  S' a  is  to  find  the  clause-form  equivalent 
for  S'ar  Having  found  the  clause-form  equivalent  of  S'a,  the  theorem 
prover  will  attempt  to  find  new  clauses  that  are  logically  implied  by  the 
clauses  in  S'a.  If  it  can  show  that  the  empty  clause,  nil,  is  logically  im- 
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plied  by  then  it  will  have  shown  that  S' a  is  unsatisfiable,  since  nil 
is  false  for  any  interpretation.  The  theorem  pr overs  discussed  here  use 
an  inference  rule  known  as  the  binary  resolution  principle  to  find  clauses 
that  are  logically  implied  by  other  clauses.  The  basic  process  used  in 
the  binary  resolution  principle  is  known  as  the  unification  procedure. 

The  Unification  Procedure 

To  describe  this  procedure,  some  terminology  must  be  introduced. 
A  substitution  0  —  {(?i,Vi),  (?2,V2), .  .  . ,  {tn,Vn)}  is  an  operation 
that,  when  applied  to  a  clause  C,  yields  another  clause  C^,  obtained  by 
replacing  each  occurrence  in  C  of  the  variables  by  the  corresponding 
terms Ti  (we  require  for  any  given  substitution  0  that  i^j  implies 
Vi  For  example,  application  of  the  substitution 

0  -  {(g(z),x),{a,y)} 

to  the  clause 

C  =  pP(x,y)mb.y)} 

yields  the  clause 

C0==OPig(z),a),Q(b^a)} 

Although  it  is  required  in  a  substitution  that  all  the  individual  symbols 
Vi  be  distinct,  it  is  not  required  that  all  the  terms  U  be  distinct.  The 
empty  substitution  e  —  {  }  consists  of  not  replacing  anything  so  that 
for  all  C,  Ce  =  C.  If  for  two  clauses  C  and  C',  there  is  some  substitu¬ 
tion  0  such  that  CO  —  C\  then  C'  is  said  to  be  an  instance  of  C.  If 
C'  contains  no  variables,  then  C'  is  said  to  be  a  ground  clause  and  to  be 
a  ground  instance  of  C.  Thus,  for  the  two  clauses 

C  -  and  C  =  {i?(c,a,&),5(c,/(c) )} 

the  substitution 

{{c,x),{a,y')Xb,z),{cM))) 

makes  C\  =  C',  so  C'  is  a  ground  instance  of  C.  We  say  a  clause  C  is 
unifiable  if  there  is  a  substitution  0  such  that  CO  contains  only  one  literal. 
Such  a  substitution  is  said  to  be  a  unifier  for  the  clause  C.  Thus,  the 
substitution  a  =  {{a,x)Xb,y)}  unifies  the  clause 

C^  {P(xdiy),b),P{xd(b),b)} 

producing  the  clause 

Ca={PiaJ(b),b)} 
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If  a  is  a  unifier  for  a  clause  C,  then  the  clause  Ca  is  known  as  a  unifica¬ 
tion  of  C,  and  both  C  and  the  literals  in  C  are  said  to  be  unified  by  a. 

A  given  clause  may  have  several  unifiers.  Thus,  the  clause  C  con¬ 
sidered  above  has,  besides  the  unifier  a,  the  unifier  ^  =  {(b,y)},  which 
produces  the  unification 

C^  =  {P(xJ(b),b)} 

The  unifier  ^  is,  in  a  sense,  “more  general”  than  the  unifier  a  because 
does  not  specify  a  substitution  for  the  variable  symbol  ;c.  The  resolution 
procedure  described  in  the  next  section  works  most  successfully  if  we 
are  able  to  find  very  general  unifiers  for  clauses,  and  it  is  the  purpose 
of  the  unification  procedure  to  find  the  most  general  unifier  {mgu)  for 
any  given  clause,  provided  the  clause  can  be  unified.  To  state  the 
unification  procedure,  we  need  to  define  the  notion  of  a  “composition” 
of  substitutions. 

The  composition  of  two  substitutions  a  and  p  is  denoted  by  the 
expression  ap  and  is  that  substitution  obtained  by  applying  p  to  the 
terms  oi  a  and  then  adding  to  a  any  {UyVf)  pairs  in  p  that  have  variable 
symbols  Vi  not  occurring  among  the  variables  of  a.  Thus,  if 

and 

P  -  {(a,x),(b,w),(c,z)} 

then 


=  {(g(a>y),z),(f(ci,b),w)Xa,x)} 

It  can  be  shown  that  applying  a  and  p  successively  to  any  clause  C 
yields  the  same  result  as  applying  <xp  to  C.  Thus,  (C<x)p  =  Cap.  Simi¬ 
larly,  one  can  show  that  composition  is  associative;  that  is,  that  for 
any  clause  C  and  substitutions  a,  p,  and  y,  we  have  (C<xp)y  —  (Ca)Py. 

A  substitution  X  that  unifies  a  clause  C  is  said  to  be  most  general 
if,  given  any  other  unifier  0  of  C,  one  can  always  find  a  substitution  y 
such  that  Ay  =  0\  that  is,  such  that  CAy  =  CO.  The  unifications  of  a 
clause  C  produced  by  its  most  general  unifiers  are  all  alphabetic  vari¬ 
ants  of  each  other;  that  is,  each  of  them  may  be  obtained  from  any  of 
the  others  by  a  substitution  of  variable  symbols  for  variable  symbols. 
Thus  C  =  {P(x,f(y),b),P(b,f(w),b)}  has  the  most  general  unifications 
{P(bJ(y),b)}  and  {P(bJ(w) yb)}  and  the  second  unification  may  be 
obtained  from  the  first  by  application  of  the  substitution  {(w,y)  }.  The 
most  general  unifiers  for  these  unifications  are  {(b,x),(y,w)},  and 
{ (  ) ,  (  w,y ) ) ,  respectively. 
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We  can  now  state  a  unification  procedure  that  finds  the  most  gen¬ 
eral  unifier  for  any  given  clause  C  if  C  is  unifiable,  and  reports  failure  if 
C  is  not  unifiable.  The  unification  procedure  makes  use  of  two  ‘‘pro¬ 
gram  variables,”  \k  and  k,  which  are  initially  set  to  c  and  0;  throughout 
its  operation,  unification  alters  their  values.  Thus,  /:  =  0  and  Ao  =  c. 
The  eventual  value  of  Xk  is  the  most  general  unifier  of  the  given  clause 
C  (subject  to  our  comments  about  alphabetic  variants)  if  C  is  unifiable, 
and  is  E  if  C  is  not  unifiable.  The  steps  of  the  procedure  are  as  follows: 

1.  If  CXk  contains  only  one  literal,  then  return  Xk  as  the  mgu 
for  C  and 

2.  If  CXk  contains  more  than  one  literal,  find  the  first  symbol 
position  for  each  literal  in  which  not  all  literals  have  the 
same  symbol.  For  example,  if 

CXk-  {F(g(x),a,/(M,v)),P(M,a,z)} 
t  t 

then  the  first  symbol  positions  are  as  marked  by  the  arrows. 

3.  Construct  the  disagreement  set  for  CAfc,  which  contains  the 
well-formed  expressions  (terms  or  literals)  from  each  literal 
in  CXk  that  begins  at  the  marked  positions.  Thus,  the  dis¬ 
agreement  set  for  the  example  is  {g{x),u}. 

4.  If  there  exist  two  terms  Sk  and  4  in  the  disagreement  set 
such  that  Sk  is  a  variable  symbol  and  4  does  not  contain  Sk, 
then  take  any  two  such  terms  Sk  and  4,  replace  Xk  by 
Afc+i  =  Xk{  (tk>Sk) }  and  replace  khy  1,  and  go  to  step  1. 
For  our  example,  Sk  may  be  taken  to  be  u,  and  4  may  be 
taken  to  be  g(jc).  Thus, 

Afc+i  =  Afc{(g(.3:),w)} 

and 

CXk+x  -  {P{g{x),a,j{g{x),v)),P{g{x),a,z)}. 

5.  If  there  do  not  exist  two  such  terms  4  and  tk  in  the  disagree¬ 
ment  set,  then  report  that  C  cannot  be  unified  and  stop. 

No  proof  will  be  offered  that  the  unification  procedure  does  in 
fact  find  the  most  general  unifier  (see  J.  A.  Robinson,  1965,  or  Luck- 
ham,  1967).  For  the  example  shown  in  the  explication  of  the  procedure, 
if  C  were  initially  the  clause  {F(g(x),a,/(M,v) ),  P(w,a,z)),  then  the  pro¬ 
cedure  would  return  the  mgu  A2  =  ((g(^),w),(/(g(^)jV),z)},  and  the 
most  general  unification  of  C  would  be 

CA2=  {P(g(^)A/(g(-^).v))} 
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Examples  of  some  clauses  and  their  most  general  unifications  con¬ 
clude  this  discussion. 

Clauses  Most  General  Unifications 


{P(x),P(a)} 

{Q(x,y,a),Q(x,y,b)} 

{R(zJ(x),y),R(a,y,f(x)y} 

{P(x,z,y),P(u,i,a),P(w,u,a)} 

{PU(x)),P{x)} 

{P(f(x),y.giy}),P{Kx),z.g(x))} 


{Pia)} 
Not  unifiable 


{R(aJ(x)J(x))} 

{P(a,a,a)} 

Not  unifiable 


{P(f(x),x,g(x))} 


The  Binary  Resolution  Procedure 

An  inference  procedure  used  by  many  theorem-proving  programs 
may  now  be  stated.  This  procedure  is  known  as  the  binary  resolution 
procedure,  and  it  constitutes  an  inference  rule  that  enables  us  to  con¬ 
struct  some  of  the  clauses  that  are  logically  implied  by  any  given  set 
of  clauses.  The  resolution  process  will  be  explained  first,  followed  by 
a  description  of  its  use  in  a  procedure  to  prove  that  a  given  set  of  clauses 
is  unsatisfiable,  when  the  set  is  in  fact  unsatisfiable. 

Suppose  we  wish  to  find  clauses  that  are  logically  implied  by  two 
given  clauses,  say,  Ci  and  C2.  Let  us  denote  the  literals  belonging  to 
Cl  by  Li  and  those  belonging  to  C2  by  Mj.  Thus,  Ci  =  {Li}  and 
C2  =  {M^}.  Let  us  suppose  that  Cl  and  C2  have  no  variables  in  com¬ 
mon  (we  can  always  rename  the  variables  in  one  or  the  other  clause  to 
accomplish  this).  Let  {k}  C  {Li}  and  {mj}  C  (Mj)  be  two  subsets  of 
{Li}  and  {M^}  such  that  a  most  general  unifier  X  exists  for  the  set  of 
literals  {k}  U  {~^frij}.  Then  the  clause 

C3  -  {{Li}  -  {li}}x  U  {{Mj}  -- 

is  logically  implied  by  Ci  and  C2.  Depending  on  how  we  choose  {k} 
and  {mj}  we  may  obtain  other  clauses  that  are  logically  implied  by  Ci 
and  C2.  It  may  not  be  possible  to  choose  {h}  and  {mj}  such  that 
{li}  U  {~^rnj}  can  be  unified.  However,  if  {/J  and  {my}  can  be  so  chosen, 
then  the  two  clauses  Ci  and  C2  are  said  to  resolve  and  to  be  parent 
clauses  for  the  resolvent (s)  C3.  The  process  of  choosing  {li}  and  (my) 
sets  to  find  resolvents  for  two  given  parent  clauses  is  called  the  resolu¬ 
tion  process.  Since  the  clauses  Ci  and  C2  we  consider  are  always  finite, 
there  are  only  a  finite  number  of  ways  we  can  choose  {k}  and  (my). 
Thus,  Cl  and  C2  can  have  only  a  finite  number  of  resolvents. 

As  an  example,  consider  the  two  clauses 
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{P(Kx),y),P(z,f(a)),Q(u)} 

and 

C2-pP(7,z),12(/W)} 

If  we  choose  {k}  =  {P(f{x),y)}  and  {nij}  =  then 

=  {PUix),y),P(y,z)},  which  has  the  mgu 

A  =  {(f(x),y),ifix),z)} 

The  corresponding  resolvent  for  Ci  and  C2  is 

Cs=  {P(z4{a)),Q(u),l2iKx))]X 
^  {P(Kx)J{a)),Qiu),-^Q(Kx))} 

Similarly,  if  we  choose  {k}  =  {P(f(x),y),P(z,f(a))}  and  {rrij]  = 
{lP();,z)},  then  we  find  that  {lijUOrrij}  -  {P(f(x),y),PizJia)), 
P(y,z)}  has  the  mgu  X'  =  {(fia),y),(fW9z),{a,x)},  and  we  obtain 
the  corresponding  resolvent 

C3  =  {Q(«),  12(/W)}  A' =  {S(m),  12(/(^))} 

Altogether,  Ci  and  C2  have  four  different  resolvents,  of  which  three 
may  be  obtained  by  resolving  on  P  and  one  may  be  obtained  by  resolv¬ 
ing  on  Q. 

Again,  let  Ci  =  {“fP(^),P(;r)}  and  C2—  {“IR(a:),(2(^)}-  K  we 
choose  {/J  =  {P(jc)}  and  {ntj]  —  {“iP(jc)},  we  obtain  the  resolvent 
=  {^P{x),Q{x)}.  These  three  clauses  correspond  respectively  to 
the  predicate  calculus  formulas 

Vjc(PU)->P(jc)) 

^x{R{x)-^Q{x)) 

yx{P{x)--^Q{x)) 

The  English  meanings  of  these  formulas  are 

“Everything  with  property  P  has  property  R.** 

“Everything  with  property  R  has  property  Q/' 

“Everything  with  property  P  has  property  Q” 

In  this  case  it  should  be  intuitively  clear  that  the  third  statement  is 
logically  implied  by  the  first  two. 

Given  a  pair  of  clauses  Ci  and  C2,  we  obtain  their  resolvents  by 
attempting  to  apply  the  resolution  process  to  Ci  and  C2  with  respect 
to  each  possible  combination  of  their  subsets  {k}  and  {m^}.  (A  com¬ 
puter  program  can  be  designed  not  to  investigate  som.e  combinations 
that  obviously  will  not  work,  such  as  those  combinations  that  use  more 
than  one  predicate  symbol.)  If  one  of  the  resolvents  of  the  clauses  Ci 
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and  C2  is  the  empty  clause  nil,  then  we  know  that  Ci  and  C2  cannot 
both  be  satisfied  (either  one  of  them  might  be  satisfiable,  but  there  is 
no  model  which  will  make  them  both  true).  For  example,  the  resolvent 
of  Cl  ==  {P(x)}  and  C2  =  is  the  empty  clause. 

The  binary  resolution  procedure  for  showing  that  a  set  S  of  clauses 
is  unsatisfiable  can  now  be  very  simply  stated.  Let  5  be  a  set  of  clauses 
{Ci,C2,  . . .  ,Cn}.  We  apply  the  resolution  process  successively  to  each 
pair  of  clauses  Ci,  Cj(i  ¥=j),  and  place  any  resolvents  obtained  in  a  new 
set  R(S).  When  we  have  gotten  all  possible  resolvents  from  S  (for  any 
finite  set  of  finite  clauses  there  are  only  finitely  many  possible  resolvents, 
and  we  can  tell  when  all  the  possibilities  have  been  tried),  we  apply 
the  binary  resolution  procedure  to  the  set  Ri(S)  =  S[JR(S),  which 
contains  all  the  clauses  in  S  plus  all  their  resolvents.  This  yields  the  set 
of  all  resolvents  of  Ri(S),  which  is  denoted  by  R^RiiS)).  Next  we 
form  the  set  R2(S)  =  RiiS)[JRiRi(S)),  which  contains  all  the 
clauses  in  Ri(S)  plus  all  their  resolvents.  We  apply  the  resolution  pro¬ 
cedure  to  R2(S)  to  obtain  the  set  RiR2iS)),  and  we  form  the  set 
i?3(5)  =  R2{S)\JR(R2(S)),  In  general, 

Ri+i(S)  =  RdS)URiRi(S)) 

where  S  is  our  initial  set  of  clauses;  if  .Y  is  a  set  of  clauses,  then  R(X) 
denotes  the  set  of  all  resolvents  of  the  clauses  in  Z.  The  set  Ri(S)  is 
called  the  ith  level  of  clauses  that  are  logically  implied  by  S.  The  resolu¬ 
tion  procedure  consists  of  developing  in  succession  the  levels  of  clauses 
that  are  implied  by  S  until  either  we  run  out  of  computation  time  (in 
which  case  the  answer  is  “no  proof  found”)  or  the  empty  clause  nil 
is  produced  as  a  resolvent  in  some  level.  If  nil  is  ever  produced,  then 
we  know  that  S  is  unsatisfiable.  The  resolution  procedure  corresponds 
to  a  breadth-first  development  of  the  clauses  that  are  logically  implied 
by  5. 

A  graph  that  ( 1 )  associates  the  empty  clause  with  one  of  its 
nodes,  (2)  associates  only  the  ancestors  (parents,  parents  of  parents, 
etc.)  of  the  empty  clause  with  the  rest  of  its  nodes,  and  (3)  connects 
each  node  only  to  those  clauses  that  are  its  parents  or  of  which  it  is  a 
parent  (as  determined  by  the  resolution  procedure)  is  called  a 
tion  graph  of  S  and  constitutes  a  simple  proof  that  S  is  unsatisfiable. 
Figure  6-1  shows  a  refutation  graph  for  the  unsatisfiable  set  of  formulas. 

5  =  {V;cP(;c),  VxP(x)-^2(;c),  V;te(^)*->^(^),"^J^(^)} 
which  is  equivalent  to  the  set  of  clauses 

{{P(jc)},pp(3;),2(y)},p(2(a),E(z)},rE(^)}} 
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{nP(y),Q(y)} 


Figure  6-1.  A  refutation  graph. 

Of  course  not  all  the  clauses  that  are  logically  implied  by  S  are  shown 
in  a  refutation  graph. 

It  is  possible  to  prove  that  the  resolution  procedure  is  a  valid 
way  of  showing  that  a  set  of  clauses  is  unsatisfiable  (i.e.,  if  nil  is  pro¬ 
duced  by  the  procedure,  then  the  set  must  be  unsatisfiable;  if  the  cor¬ 
responding  set  of  formulas  can  be  shown  unsatisfiable,  using  the  five 
inference  rules  presented  in  the  first  section  of  this  chapter,  then  nil 
will  be  produced),  but  space  does  not  permit  the  presentation  of  such 
a  proof.  The  reader  is  referred  to  (Nilsson,  1971,  pp.  181-183). 

Summary 

To  summarize  the  theorem-proving  technique  described  above, 
the  theorem  prover 

1.  is  given  a  set  Sa  of  axioms  and  a  conjecture  C/. 

2.  forms  the  negated-conjecture 

3.  forms  the  set  of  formulas  S' a,  consisting  of  Sa  and  “It/, 

4.  produces  the  clause-form  equivalent  of  5'®. 

5.  applies  the  resolution  procedure  to  this  set  of  clauses  until 
it  either  runs  out  of  time  or  produces  the  empty  clause  nil, 

6.  constructs  a  refutation  graph  if  nil  is  produced  and  an¬ 
nounces  that  it  has  found  a  proof  for  the  conjecture. 

A  theorem  prover  that  uses  the  resolution  procedure  is  said  to  be 
resolution-based . 
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Heuristic  Search  Strategies 

Extensions 

Methods  of  theorem  proving  such  as  the  resolution  procedure  are 
not  practically  applicable  to  systems  with  more  than  a  few  axioms. 
They  correspond  to  a  simple  “breadth-first”  development  of  the  con¬ 
sequences  of  the  system.  It  is  necessary  to  use  resolution  in  a  selective 
manner,  if  one  wishes  to  develop  a  theorem  prover  that  can  operate  with 
systems  of  more  than  about  ten  axioms.  Various  selective  techniques  for 
resolution-based  theorem  provers  have  been  developed.  These  tech¬ 
niques  are  generally  known  as  heuristic  search  strategies  because  of 
the  way  in  which  they  alter  a  theorem  prover’s  development  of  the 
consequences  of  a  given  system.  Using  such  strategies,  it  is  possible 
for  a  theorem  prover  to  prove  fairly  difficult  theorems  in  systems  having 
up  to  two  dozen  axioms  (the  proof  of  such  a  theorem  might  be  50 
steps  long).  This  section  reviews  some  of  the  currently  used  search 
strategies  for  theorem  provers.  These  strategies  fall  into  three  basic 
categories:  refinement  strategies,  simplification  strategies,  and  ordering 
strategies  (see  Nilsson,  1971). 

Simplification  Strategies 

Often  it  is  possible  to  eliminate  literals  or  clauses  from  a  set  of 
clauses,  in  a  manner  that  does  not  affect  the  unsatisfiability  of  the  set 
of  clauses.  (That  is,  if  the  set  is  unsatisfiable  before  simplification,  it  will 
be  unsatisfiable  afterward.  Conversely,  if  a  set  is  satisfiable  before 
simplification,  it  will  be  satisfiable  afterward.)  When  this  can  be  done,  it 
will  reduce  the  rate  at  which  irrelevant  clauses  are  generated.  Three 
ways  of  simplifying  a  set  of  clauses  are  to  eliminate  tautologies,  evaluate 
predicates  where  possible,  and  to  eliminate  clauses  that  are  subsumed 
by  other  clauses. 

A  tautology  is  a  statement  of  the  form  ''A  or  not  A”  In  predicate 
calculus  every  tautology  is  trivially  true.  The  clause-form  equivalent  of 
a  tautology  is  a  clause  that  contains  both  a  literal  and  the  complement 
of  that  literal.  If  such  a  clause  belongs  to  a  set  of  clauses,  then  it  may 
be  eliminated  from  the  set  without  affecting  the  unsatisfiability  of  the 
set.  Thus,  clauses  like 

{Q{x,y),-^R{z)My))  and  {P(/(x)),np(/(x))} 
may  be  eliminated. 

Sometimes  it  is  possible  to  evaluate  the  truth  value  of  a  literal  im¬ 
mediately  after  the  clause  containing  it  is  generated.  In  such  a  case,  if 
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the  literal  has  the  value  “true,”  then  the  entire  clause  may  be  eliminated 
without  affecting  the  unsatisfiability  of  the  set  S  of  clauses.  If  the  literal 
has  the  value  “false,”  then  it  may  be  eliminated  from  the  clause  in 
which  it  occurs.  Generally,  it  is  possible  to  evaluate  a  literal  only  if  one 
has  some  specific  information  about  the  nature  of  its  predicate.  Thus, 
one  might  have  a  predicate  P(x,y,z)  equivalent  to  “the  sum  of  num¬ 
ber  X  and  number  y  is  number  z.”  In  such  a  case,  literals  using  this 
predicate  can  be  immediately  evaluated  by  machine. 

A  clause  Ci  =  {LJ  is  said  to  subsume  a  clause  C2  —  {Mj}  if 
there  is  a  substitution  6  such  that  {Li}e  is  a  subset  of  (M^}.  For  ex¬ 
ample,  {P(x),Q(a)}  subsumes  {P(/(a)),e(a),i?(y) }.  If  Ci  and  C2 
are  clauses  in  S  and  Ci  subsumes  C2,  then  C2  may  be  eliminated  from  S 
\vithout  affecting  the  unsatisfiability  of  S,  Intuitively,  Ci  is  “more  gen¬ 
eral”  than  C2. 

Usually,  it  is  wise  to  eliminate  tautologies  and,  where  possible, 
evaluate  predicates  before  eliminating  by  subsumption.  Subsumed 
clauses  should  be  eliminated  only  after  each  level  R^Sa)  of  S  has  been 
completely  developed  (see  Kowalski,  1970a,b). 

Refinement  Strategies 

As  we  have  indicated,  the  resolution  principle  presented  in  the 
preceding  section  can  be  generalized.  It  can  also  be  modified  to  produce 
new  inference  rules  that  restrict  the  possible  clauses  in  S  which  may 
be  resolved,  beyond  the  simple  requirement  that  they  be  resolvable. 
Such  a  modification  is  known  as  a  refinement  strategy,  and  is  equivalent 
to  a  new  inference  rule  Rc  that  permits  resolutions  only  between  clauses 
that  satisfy  a  refinement  criterion  C.  A  refinement  strategy  is  said 
to  use  resolution  relative  to  C.  Many  different  refinement  strategies  for 
resolution  have  been  developed.  One  of  these,  the  ancestry-filter  (af) 
strategy,  will  now  be  presented.  For  a  discussion  of  other  strategies,  see 
Nilsson  (1971). 

If  clauses  Ci  and  C2  can  be  resolved  to  form  clause  C3,  we  say 
Cl  and  C2  are  parents  of  C3.  Given  a  sequence  Ci,  C2,  .  .  .  ,Cn  such  that, 
for  \^  i^  n,  Ci  is  a  parent  of  C^+i,  we  say  Ci  is  an  ancestor  of  C„ 
and  Cn  is  a  descendant  of  Ci  (refer  to  the  terminology  for  graphs  in 
Chapter  3), 

The  refinement  criterion  for  ancestry-filtered  resolution  can  now 
be  stated.  Two  clauses  will  be  resolved  if  and  only  if  either 

/)  Both  belong  to  5a. 

ii)  One  belongs  to  Sa  and  the  other  is  a  descendant  of  a  clause 
in  5a. 

in)  One  is  an  ancestor  of  the  other. 
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The  use  of  this  criterion  in  effect  gives  us  a  new  inference  rule,  which  is 
denoted  as  Raf>  If  S  is  a  set  of  clauses,  then  Raf(S)  denotes  the  set  of 
all  resolvents  between  pairs  of  clauses  that  belong  to  S  and  satisfy  the 
ancestry-filter  criterion.  Thus,  defining, 

RAF^iS)  = 

and 

Raf^^\S)  =Raf^{Raf\S)), 

it  can  be  proved  that  if  Sa  is  unsatisfiable,  then  there  is  some  n  such  that 
RAF^(Sa)  contains  the  empty  clause.  Conversely,  if  5^  is  satisfiable,  then 
there  is  no  n  such  that  i?"  (5a)  contains  the  empty  clause.  Thus,  resolu¬ 
tion  relative  to  ancestry  filtering  can  be  used  in  place  of  ordinary  reso¬ 
lution.  Figure  6-2  shows  a  refutation  of  a  simple  set  of  clauses,  produced 
according  to  the  ancestry-filter  refinement  of  resolution. 


Figure  6-2.  A  search  for  refutation  using  the  AF  strategy. 
(Niisson,  1971,  reprinted  with  permission.) 


In  practice  it  is  possible  to  add  further  restrictions  to  the  af 
strategy.  Any  such  restriction  will,  in  effect,  add  to  the  refinement 
criterion  used  by  the  theorem  prover.  Before  using  a  refinement  criterion 
it  is  important  to  prove  the  completeness  of  resolution  with  respect  to 
that  criterion;  that  is,  one  must  show  that  if  5  is  unsatisfiable,  then  there 
is  some  n  such  that  Rc^{S)  contains  the  empty  clause;  if  5  is  satisfiable, 
then  there  is  no  n  such  that  R^{S)  contains  the  empty  clause.  It  is  also 
important  to  show  that  the  total  cost  of  using  the  criterion  (in  terms  of 
computation  time  and  memory  space  used  by  the  computer)  is  less  than 
the  cost  of  generating,  storing,  and  resolving  the  clauses  eliminated  by 
the  criterion.  The  af  refinement  strategy  for  resolution  generally  tends 
to  produce  much  deeper  but  less  broad  searches  than  would  be  pro- 
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duced  by  unrefined  resolution.  Other  refinement  strategies  include  the 
“set-of-support”  strategy,  and  ‘‘model”  strategies. 

Ordering  Strategies 

Ordering  strategies  are  the  most  “heuristic”  of  the  three  types  of 
search  strategies  considered  in  this  section.  Ordering  strategies  cor¬ 
respond  to  the  use  of  evaluation  functions  for  searching  state-space 
graphs  (discussed  in  Chapter  3).  A  given  ordering  strategy  does  not 
necessarily  prohibit  resolution  between  certain  pairs  of  clauses  (as  do 
refinement  strategies).  Rather,  an  ordering  strategy  provides  that  resolu¬ 
tion  between  certain  pairs  of  clauses  shall  be  performed  before  resolu¬ 
tion  between  other  pairs. 

Suppose  we  have  an  inference  rule  R  (possibly  a  refinement  of 
resolution).  The  search  for  a  refutation  of  a  set  of  clauses  Sa  cor¬ 
responds  to  the  development  of  successive  levels  R‘(5a),  each  level  con¬ 
taining  the  preceding  ones  as  subsets,  until  a  level  R"(5a)  is  produced, 
which  contains  the  empty  i  clause.  In  other  words,  it  is  a  breadth-first 
search.  Up  to  now  the  strategies  discussed  are  means  of  narrowing  the 
breadth  of  the  refutation  search  done  by  a  theorem  prover  (see  Fig. 
6^2). 

Theorem  provers  that  use  ordering  strategies  do  not  do  a  breadth- 
first  search,  although  they  may  make  use  of  the  simplification  and  re¬ 
finement  strategies  discussed  previously.  A  theorem  prover  that  uses  an 
ordering  strategy  selectively  generates  portions  of  the  levels  below  an 
initial  set  of  clauses  Sa,  in  a  depth-first  manner.  If  the  first  sequence  of 
portions  that  it  generates  down  to  some  level  n  does  not  produce  the 
empty  clause,  then  it  will  “back  up”  and  try  generating  another  sequence 
of  portions  (see  Fig.  6-3).  Perhaps  the  two  most  common  ordering 
strategies  are  the  unit-preference  strategy  and  the  fewest-components 
strategy. 

A  unit  is  a  clause  that  contains  only  one  literal.  Similarly,  a  double¬ 
ton  contains  two  literals,  etc.  In  the  unit-preference  strategy  the  theorem 
prover  first  attempts  to  resolve  units  against  units  (if  this  succeeds,  then 
it  has  produced  the  empty  clause),  then  units  against  doubletons,  then 
units  against  tripletons,  etc.  Thus,  given  a  set  S  of  clauses,  the  theorem 
prover  generates  the  “unit  preference”  portion  of  R'(S),  It  then  gen¬ 
erates  the  unit-preference  portion  of  R^(5),  etc.,  and  continues  until  it 
either  produces  a  null  clause  or  reaches  its  level  hound  n;  that  is,  until 
it  generates  the  unit-preference  portion  of  If  it  reaches  its  level 

bound  without  producing  the  empty  clause,  it  might  back  up  to  S  and 
begins  generating  “doubleton-preference”  portions  of  the  levels  R\S) 
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Figure  6-3.  A  schematic  indication  of  the  refutation  searches  produced 
by  (a)  resolution,  (b)  AF  resolution,  (c)  AF  resolution  with  an  ordering 

strategy. 

below  5,  but  within  the  same  level  bound.  Similarly,  it  could  continue 
to  “tripleton-preference”  portions,  etc.  If,  in  the  course  of  generating 
a  “^-tupleton-preference”  portion  of  some  level,  the  theorem  prover 
produces  a  clause  containing  p  literals,  where  p  is  less  than  q,  the 
theorem  prover  reverts  to  the  generation  of  p-tupleton-preference  por¬ 
tions. 

If  the  empty  clause  can  be  produced  at  all  within  the  level-bound  n, 
it  will  be  produced  by  a  theorem  prover  using  the  unit-preference  order¬ 
ing  strategy.  Quite  often  the  unit-preference  strategy  will  enable  a 
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theorem  prover  to  greatly  reduce  the  amount  of  resolution  it  does  in 
order  to  produce  a  refutation.  We  can  justify  this  intuitively  by  pointing 
out  that  the  unit-preference  strategy  directs  the  efforts  of  the  theorem 
prover  toward  those  clauses  that  contain  the  fewest  literals,  and  that 
it  is  relatively  rare  for  two  clauses  containing  many  literals  to  resolve 
directly  to  the  empty  clause. 

In  the  fewest-components  strategy  the  order  in  which  pairs  of 
clauses  are  resolved  depends  on  the  length  (number  of  literals,  or  total 
number  of  symbols)  of  their  resolvent.  This  strategy  usually  is  more 
costly  than  the  unit-preference  strategy  because  it  requires  that  the 
theorem  prover  compute  estimates  of  resolvent  lengths  for  the  pairs 
of  clauses  it  has  generated  at  a  given  level  before  generating  their  re¬ 
solvents. 


Review 

The  simplification,  refinement,  and  ordering  strategies  discussed 
in  this  section  are  all  syntax-oriented:  A  theorem  prover  that  uses  them 
searches  more  selectively  for  a  refutation  than  it  would  if  it  did  not  use 
these  strategies,  though  it  does  so  in  a  way  that  is  dependent  more  on 
the  structures  of  the  expressions  it  generates  rather  than  on  their  re¬ 
lations  to  each  other.  Its  selectivity  has  little  relation  to  the  “meaning” 
or  semantics  of  the  theorem  it  is  trying  to  prove.  It  is  desirable  for  a 
theorem  prover  to  be  able  to  select  the  clauses  it  resolves  in  some  man¬ 
ner  that  is  dependent  on  their  relevance  to  the  theorem  it  is  trying  to 
prove.  Also,  it  is  desirable  for  a  theorem  prover  to  be  able  to  form  a 
“plan”  or  description  of  a  proof  that  has  some  likelihood  of  correspond¬ 
ing  to  the  actual  proof  for  the  theorem,  and  to  select  the  clauses  it  re¬ 
solves  according  to  how  well  they  fit  its  plan.  That  this  may  be  feasible 
will  be  evident  from  the  following  sections. 

Reasoning  by  Analogy 

Chapter  3  discussed  various  types  of  analogies  and  the  value  of 
“reasoning  by  analogy”  as  an  ability  of  a  general  problem  solver.  Kling 
(1971a,b)  developed  a  method  whereby  theorem  provers  can  select 
the  clauses  they  resolve  in  a  manner  that  corresponds  to  one  type  of 
reasoning  by  analogy.  The  presentation  of  his  method  in  this  section  is 
highly  schematic  and  is  intended  merely  to  give  the  reader  a  good  idea 
of  Kling’s  approach.  For  more  detailed  information  the  reader  should 
see  Kling’s  own  explications. 
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Let  us  consider  the  case  of  a  theorem  prover  being  used  to  prove 
theorems  about  abstract  algebra.®  Such  a  theorem  prover  might  have  a 
standard  set  Sa  of  clauses  that  would  constitute  its  basic  knowledge,  or 
axiomatization,  of  abstract  algebra.  The  user  of  the  theorem  prover 
would  supply  it  with  a  theorem  T  to  be  proved,  stated  using  the  predi¬ 
cates  and  functions  that  occur  in  Sa^  The  theorem  prover  would  form 
the  negation  of  the  theorem,  reduce  TT  to  clause  form,  add  it  to  Sa 
to  form  a  set  of  clauses  S,  and  attempt  to  prove  the  unsatisfiability  of 
5.  This  procedure  could  be  followed  for  virtually  any  theorem  T  about 
abstract  algebra.  In  principle,  the  generality  of  the  theorem  provers  as 
problem  solvers  depends  only  on  the  extent  to  which  problems  can  be 
described  by  sets  of  statements  in  the  predicate  calculus.  As  later  sec¬ 
tions  will  show,  it  is  certainly  possible  to  express  sonle  aspects  of  real- 
world  problems  within  predicate  calculus  formalizations. 

In  fact,  however,  the  generality  of  theorem  provers  as  problem 
solvers  is  limited  by  considerations  of  computation  time  and  memory 
space.  A  difficult  theorem  T  might  require  50  steps  in  the  proof  gener¬ 
ated  for  it  by  a  good  theorem  prover,  using  an  axiomatization  Sa  that 
contained  a  dozen  clauses.  For  such  an  Sa  and  T  the  theorem  prover 
might  generate  200  clauses  altogether  before  finding  the  proof.  If  the 
theorem  prover  were  given  more  axioms  than  necessary  (say,  Sa  con¬ 
taining  30  clauses),  it  might  generate  600  clauses  altogether  before  find¬ 
ing  a  proof,  and  run  out  of  space.  That  is,  it  would  generate  about  400 
irrdevant  clauses.  Even  with  optimal  use  of  the  heuristic  search  strate¬ 
gies  discussed  in  the  preceding  section,  current  theorem  provers  usually 
are  unable  to  prove  nontrivial  theorems  when  Sa  contains  more  than 
about  30  clauses.  And  a  good  axiomatization  Sa  for  a  subject  like  ab¬ 
stract  algebra  requires  about  250  clauses.*^ 

Thus,  theorem  provers  as  we  have  so  far  described  them  cannot 
be  general  problem  solvers  for  nontrivial  subjects  such  as  abstract  alge¬ 
bra.  The  axiomatizations  (or  data  bases)  for  such  subjects  are  simply 
too  large  for  a  program  (possessing  current  limitations  in  time  and 
memory  space)  to  solve  problems  in  them  without  some  way  of  i  esti¬ 
mating  which  clauses  are  “relevant”  to  the  problem  at  hand  and  should 
be  resolved  or  generated  first.  We  can  expect  the  situation  to  be  much 
worse  for  real-world  problems,  where  the  number  of  clauses  necessary 

®  For  the  purposes  of  this  book  it  is  not  necessary  to  know  abstract  algebra. 
It  is  chosen  simply  for  the  sake  of  exposition  and  because  Kling  chose  it  for 
his  work. 

^  An  axiomatization  for  a  theory  must  contain  not  only  the  clauses  that  inter¬ 
relate  the  basic  undefined  words  (“point,”  “line,”  “between,”  etc.)  of  the  theory, 
but  also  those  clauses  that  define  the  nonbasic  words  (“circle,”  “triangle,”  “con¬ 
gruent”) — predicates  or  functions — used  within  the  theory. 
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for  an  adequate  axiomatization  of  reasoning-program  knowledge  about 
the  world  might  be  very  large  indeed.® 

The  situation  is  amply  illustrated  with  the  use  of  “Venn  diagrams” 
(see  Fig.  6-4).  In  proving  a  theorem  T  from  a  data  base  5,  a  theorem 


Figure  6-4.  Venn  diagram  for  the  data-base  problem. 


prover  might  generate  the  set  of  clauses  indicated  by  the  area  labeled 
5i.  Given  the  larger  data  base  S'  (which  includes  S  as  a  subset),  the 
theorem  prover  jwill  usually  generate  the  much  larger  set  of  clauses  S2 
before  it  obtains  a  proof  for  T,  In  general,  we  want  the  theorem  prover 
to  have  the  data  base  S'  available,  since  there  is  no  a  priori  information 
as  to  which  theorems  it  will  be  required  to  prove.  But,  we  would  like 
to  have  some  program  that  could  often  select,  for  any  given  theorem 
r,  a  data  base  S  from  which  T  could  be  easily  derived.  We  could  attempt 
either  to  modify  the  theorem  prover  itself  or  to  write  a  new  program 
that  would  select  data  bases  for  the  theorem  prover. 

Because  of  the  undecidability  of  the  predicate  calculus,  diis  prob¬ 
lem  cannot  be  completely  solved.  However,  Kling  provided  a  partial 
solution  to  it  in  the  form  of  an  analogy  generator  ihdX,  given  some  help, 

®  Minsky  (1968a,  p.  26)  makes  a  very  rough  estimate  that  would  correspond 
by  the  present  author’s  interpretation  to  10®  or  10®  clauses,  belonging  to  a  high- 
order  predicate  calculus,  for  a  reasoning  program  at  the  level  of  human  intel¬ 
ligence.  But,  of  course,  it  is  only  a  guess. 
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is  often  capable  of  making  a  good  selection  for  the  data  base  5,  Kling’s 
analogy-generating  program  is  called  zorba-L^  It  is  designed  to  operate 
in  conjunction  with  a  theorem  prover  (see  Green,  1969),  of  the  type  we 
have  described/’^ 

ZORBA-1  is  given  the  following  information  as  input: 

1 .  A  theorem  T,  which  is  to  be  proved  by  qa3 

2.  A  theorem  T\  which  has  already  been  proved 

3.  The  proof  of  T;  that  is,  an  ordered  sequence  of  clauses 
Cl, .  . .  .,Cn  such  that  each  clause  Ci  is  either  an  element  of 
5'  (the  large  data  base)  or  an  element  of  “IJ',  or  the  re¬ 
solvent  of  two  clauses,  say,  Cj  and  Cjc  which  occur  prior  to 
it  in  the  sequence  (i.e.,  such  that  /  and  k  are  both  less  than  i) 

4.  The  large  data-base  S' 

5.  A  “semantic  template”  for  S' 

The  first  three  items  of  this  list  are  problem-dependent;  they  vary  with 
the  theorem  T  which  is  to  be  proved  by  qa3.  The  fourth  and  fifth  items 
do  not  depend  upon  T,  but  do  depend  on  S',  To  the  extent  that  zorba- 
qa3  is  being  used  as  a  general  problem  solver  relative  to  5',  they  can 
be  considered  problem-independent. 

Given  this  information,  zorba-1  produces  an  analogy  A  consisting 
of: 

1.  a  one-to-one  association  (or  map)  between  the  predicates 
used  in  the  proof  of  T'  and  predicates  that  may  occur  in  the 
proof  of  T,  That  is,  each  predicate  used  in  the  proof  of  T' 
is  associated  with  exactly  one  predicate  that  occurs  in  5'  and 
might  be  used  in  the  proof  of  T. 

2.  A\  which  associates  each  clause  used  in  the  proof  of  T'  with 
a  set  of  clauses  that  each  occur  in  5'  and  might  be  used  in 
the  proof  of  T. 

3.  which  associates  sets  of  variables  used  in  the  proof  of  T 
with  sets  of  variables  that  might  be  used  in  the  proof  of  J. 

These  associations  are  represented  in  the  computer  by  appropriate  data 
structures,  and  are  referred  to  as  predicate  analogies,  clause  analogies, 

^ZORBA  is  an  acronym  for  (ZO)  /Reasoning  By  ytnalogy.  Zorba  was  a  pas¬ 
sionate,  intuitive  Greek,  and  many  contemporary  thinkers  consider  analogy  an 
intuitive  process  outside  the  realm  of  reason  (Kling,  1971a,  p.  4). 

QA3  was  developed  at  Stanford  Research  Institute,  principally  by  Green 
and  Raphael.  It  is  resolution-based  and  uses  the  unit-preference  heuristic.  How¬ 
ever,  it  does  not  use  ancestry-filter  refinement,  but  instead  uses  the  set-of-support 
refinement  (which  has  not  been  discussed;  see  Nilsson,  1971,  pp.  223-224)* 
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and  variable  analogies,  respectively.  T  will  be  called  an  analog  of 
T,  and  vice  versa. 

Thus,  an  analogy  developed  by  zorba  consists  of  a  predicate  anal- 
ogy,  a  clause  analogy,  and  a  variable  analogy.  Predicate  and  variable 
analogies  are  used  within  zorba-1.  Its  output  to  qa3  is  a  clause  analogy 
A\  qa3  uses  A%  together  with  as  the  data  base  which  it  attempts 
to  prove  unsatisfiable.  If  qa3  succeeds,  then  we  are  justified  in  saying 
that  the  two  programs,  zorba  and  qa3,  have  together  “reasoned  by 
analogy”  from  the  proof  of  T'  to  obtain  a  proof  of  T.  Of  course  there 
may  be  many  types  of  analogical  reasoning  that  are  not  described  by 
this  particular  paradigm.  The  importance  of  this  paradigm  to  ai  re¬ 
search  depends  only  on  how  well  it  works;  that  is,  whether  zorba  and 
qa3  are  in  fact  able  to  prove  theorems  that  could  not  be  proved  by  qa3 
alone,  relative  to  the  same  large  data  base  S'. 

In  fact,  the  zorba-qa3  program  pair  is  rather  successful,  at  least 
with  respect  to  the  data  base  for  abstract  algebra  developed  by  Kling 
(1971a).  Kling’s  abstract  algebra  data  base  5'  contains  239  clauses. 
Given  two  analogous  theorems  T  and  T'  and  a  proof  for  T'  requiring 
20  clauses,  zorba-1  could  usually  select  a  clause  analogy  A^"  containing 
less  than  two  dozen  clauses;  that  was  sufficient  for  qa3  to  use  in  proving 
T.  For  the  reader  who  is  acquainted  with  abstract  algebra,  the  following 
example  is  quoted: 

T':  “The  intersection  of  two  normal  groups  is  a  normal  group,” 

T:  “The  intersection  of  two  ideals  is  an  ideal.” 

Either  of  these  theorems  would  have  been  unprovable  for  qa3,  given 
the  data  base  S'.  However,  zorba-1  and  qa3  together  are  able  to  de¬ 
velop  a  proof  for  J,  “reasoning  by  analogy”  from  a  proof  for  T'. 

In  practice,  zorba-1  must  select  its  clause  analogy  heuristically 
by  searching  through  a  space  of  partial  analogies  (see  Fig.  6~5).  For 
an  S'  containing  239  clauses,  the  number  of  possible  clause  analogies, 
each  containing  24  clauses,  is  extremely  large  (about  10^®;  see  Kling, 
1971a,  p.llO).  zorba-1  first  develops  a  partial  analogy  ^i,  which 
is  relatively  small  and  contains  less  than  a  dozen  clauses.  For  each 
partial  analogy  Ai  that  it  develops,  it  either  adds  or  deletes  a  few  clauses 
in  order  to  create  Ai^.^.  zorba-1  is  guided  in  its  development  of 
partial  analogies  by  the  “semantic  template”  for  S',  which  is  provided 
to  it.  Usually  zorba-1  needs  to  generate  less  than  ten  partial  analogies. 
The  semantic  template  is  a  small  table  of  descriptions  for  the  predicates 
occurring  in  S'.  For  example,  the  predicate  “group”  is  given  the  de¬ 
scription 
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Figure  6-5.  Heuristic  search  through  an  analogy-space. 


STRUCTURE  (SET;OPERATOR) 

Thus,  zoRBA-1  knows  automatically  whenever  it  sees  “group  (yt;*)” 
that  A  is  3,  set  and  *  is  an  operator,  zorba-1  uses  the  semantic  template 
to  generate  descriptions  of  those  clauses  occurring  in  S'  and  in  the 
proof  of  T'.  The  clauses  it  chooses  from  S'  for  its  partial  analogies  are 
those  that  have  descriptions  similar  to  the  descriptions  of  the  clauses  in 
the  proof  of  T'.  zorba-1  terminates  its  search  when  it  has  found  analogs 
for  each  of  the  clauses  in  the  proof  of  J'.  It  then  submits  the  resulting 
clause  analogy  to  qa3. 
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In  summary,  to  reason  successfully  by  analogy,  zorba-1  and  qa3 
require  certain  essential  information: 

1.  A  theorem  T,  which  is  analogous  to  the  theorem  T  that 
ZORBA-QA3  is  required  to  prove 

2.  A  proof  of  T 

3.  A  semantic  template  for  S' 

At  the  moment,  this  information  must  be  provided  to  the  computer  by 
the  human  user  of  the  system.  The  development  of  programs  that  would 
be  capable  of  supplying  this  information  is  an  open  problem.  Even  so, 
zoRBA-1  is  important  because  it  does  indicate  a  way  for  resolution- 
based  theorem  provers  to  prove  theorems,  given  large,  nonoptimal 
axiomatizations. 


SOLVING  PROBLEMS  WITH  THEOREM 
PROVERS 

State-Space 

A  state-space  problem  as  defined  in  Chapter  3  consists 

of  a  description  of  a  set  5  of  possible  starting  states,  a  set  F  of  operators 
that  convert  one  state  into  another,  and  a  set  G  of  goal  states.  The 
problem  implied  by  such  a  description  is  to  find  a  sequence  of  operators, 
the  application  of  which  will  convert  some  starting  state  into  a  goal 
state.  A  description  of  this  sort  implicitly  defines  a  state-space  consisting 
of  a  set  of  states  and  various  possible  paths  between  them.  The  “diffi¬ 
culty”  of  a  state-space  problem  is  at  most  a  matter  of  the  size  of  the 
state-space  and  the  relative  proportion  of  solution  paths  to  nonsolution 
paths.  In  some  cases  it  is  possible  to  logically  analyze  a  description  of 
a  state-space  problem  and  show  that  its  solutions  are  equivalent  to  those 
for  a  problem  with  a  simpler  description,  a  smaller  state-space  or  set  of 
operators.  With  this  possibility  in  mind,  the  “problem-reduction”  prob¬ 
lems,  the  “problem  of  problem-representation,”  and  “global”  analysis  of 
games  were  discussed  in  Chapter  3. 

This  section  describes  how  resolution-based  theorem  provers  can 
be  used  as  general  problem  solvers  for  state-space  problems.  We  con¬ 
sider  three  questions: 

1.  How  can  predicate-calculus  theories  be  used  to  describe  state- 
space  problems? 

2.  How  can  theorem  provers  be  used  to  construct  paths  through 
state-spaces? 
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3.  To  what  extent  can  the  techniques  we  describe  be  applied  to 
real-world  problems? 

Our  discussion  of  these  questions  is  based  on  McCarthy  (1959, 
1963a,  1964,  1968),  McCarthy  and  Hayes  (1968),  Black  (1964), 
Green  (1969),  Waldinger  and  Lee  (1969),  Amarel  (1970),  Nilsson 
(1971),  and  Fikes  and  Nilsson  (1971).  Other  relevant  papers  include 
Amarel  (1967,  1968),  Slagle  (1965,  1970),  and  Slagle  and  Bursky 
(1968). 

Predicate-Calculus  Descriptions  of 
State-Space  Problems 

A  given  state-space  problem  can  usually  be  described  in  many 
ways,  depending  on  how  one  chooses  to  describe  its  states  and  operators. 
The  state-space  problems  discussed  previously  have  the  following  fea¬ 
tures: 

1.  Each  state  can  be  described  as  a  collection  of  objects,  each 
having  certain  relations  to  the  others.  For  example,  in  the 
15-Puzzle,  the  “objects”  might  be  blocks,  positions,  and  the 
null-block  or  empty  position. 

2.  Each  operator  can  be  described  as  a  procedure  for  changing 
one  state  into  another.  For  a  given  state,  zero,  one,  or  many 
operators  might  be  applicable. 

Thus,  our  terminology  for  state-space  problems  includes  “states,” 
“objects,”  “relations,”  and  “operators.”  A  predicate-calculus  descrip¬ 
tion  of  a  state-space  problem  may  reflect  these  concepts  in  the  following 
ways: 

1.  “States”  and  “objects”  can  be  represented  by  variable  sym¬ 

bols  called  {OT  situation)  variables  md  object  variables. 
In  this  discussion,  .  .represent  state  variables  and 

. .  .represent  object  variables.  Particular  constant 
states  and  objects  will  be  denoted  by  ^o, 5-1,52, .  . .  and  a,b,c, 
...  ,box,  monkey,  . . . , respectively. 

2.  “Relations”  between  objects,  and  properties  of  states  and 
actions  can  be  indicated  by  fluent  symbols,  which  are  either 
predicate  or  function  symbols.  We  are  primarily  concerned 
with  situational  fluents  (McCarthy  and  Hayes,  1968),  which 
are  functions  or  predicates  that  include  states  among  their 
arguments.  For  example,  “ON (monkey,  box,  5^)”  might  be 
a  situational-fluent  predicate  with  the  value  “true”  if  the  ob- 
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ject  “monkey”  is  on  the  object  “box”  in  state  Sq.  Similarly, 
“MOVE(monkey,  box,  a,  b,  5o)”  might  be  a  situational- 
fluent  function  which  has  as  its  value  the  state  s  produced 
when  the  object  “monkey”  moves  the  object  “box”  from 
position  a  to  position  b  in  state  So* 

First-order  predicate  calculus  formulas  that  use  the  ordinary  logic  sym¬ 
bols  ( A, V 3, V)  and  use  state  variables,  object  variables,  and 
fluent  symbols  as  their  nonlogical  symbols,  can  be  used  to  express  facts 
about  state  spaces.  A  collection  of  such  formulas  can  be  considered  a 
description  of  a  state-space,  provided  the  collection  is  satisfiable  and 
contains  formulas  that  use  situational-fluent  functions  which  represent 
the  operators  associated  with  the  state-space.  Problems  associated  with 
the  state-space  can  be  represented  by  formulas  that  are  to  be  proved, 
using  the  formulas  that  describe  the  state-space. 

EXAMPLE  6-1 .  THE  MONKEY-AND-BANANAS  PROBLEM.  This  prob¬ 
lem  (McCarthy,  1963a)  was  given  as  an  exercise  in  Chapter  3. 
It  is  one  of  the  classic  “toy  problems”  considered  by  ai  re¬ 
searchers  as  an  example  of  an  extremely  simple  problem  that 
involves  common-sense  reasoning  about  situations,  actions,  tools, 
etc.  The  problem  is  repeated  here  along  with  a  predicate-calculus 
formalization  for  it: 

A  monkey  is  in  a  room  where  a  bunch  of  bananas  is 
hanging  from  the  ceiling,  too  high  to  reach.  In  the  corner 
of  the  room  is  a  box,  which  is  not  under  the  bananas. 
How  can  the  monkey  get  the  bananas?  The  solution  to 
the  monkey’s  problem  is  to  move  the  box  under  the 
bananas  and  climb  onto  the  box,  from  which  the  bananas 
can  be  reached. 

The  objects  used  in  our  description  of  this  state-space  problem 
are  monkey,  box,  bananas,  place  1,  place2,  place3.  The  oper¬ 
ators  used  are  goto,  move,  climb,  and  reachfor,  each  of  which 
will  be  a  situational-fluent  function.  The  relations  used  in  the 
description  are  under,  on,  at,  and  has-bananas,  each  of  which 
will  be  a  situational-fluent  predicate.  Table  6-1  gives  the  first- 
order  predicate  calculus  formulas  that  correspond  to  a  descrip¬ 
tion  of  the  monkey-and-bananas  state-space,  using  these  objects, 
operators,  and  relations.  The  monkey’s  problem  is  represented 
by  the  formula 


(3^-)  (has-bananas  (s) ) 


(6-1) 
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which  is  to  be  proved,  using  the  formulas  in  Table  6-1.  The 
formulas  in  Table  6-1  do  not  say  explicitly  that  it  is  possible 
for  the  monkey  to  get  the  bananas.  However,  if  we  can  prove 
formula  6-1  from  the  formulas  in  Table  6—1,  then  we  may  con¬ 
clude  that  there  is  some  sequence  of  applications  of  the  operators 
that  will  convert  state  So,  in  which  the  monkey  does  not  have  the 
bananas,  into  a  state  s,  in  which  he  does.  We  can  conclude  this 
because  the  only  state  that  is  explicitly  mentioned  in  Table  6—1 
is  the  state  Sq.  If  formula  6-1  holds  for  all  models  of  the  formulas 
in  Table  6-1 ,  then  it  must  hold  for  the  model  in  which  the  only 
states  that  exist  are  So  and  those  which  can  be  obtained  from  So 
by  the  application  of  some  sequence  of  operators  to  it.  Thus,  if 
the  formulas  of  Table  6-1  logically  imply  formula  6-1,  then 
there  is  some  way  for  the  monkey  to  convert  So  into  a  state 
s  for  which  “has-bananas(s)”  is  true. 

Figure  6-6  shows  the  proof  of  formula  6-1,  using  the  for¬ 
mulas  in  Table  6—1  and  the  resolution  procedure.  The  negation 
of  formula  6-1  is  added  to  the  set  of  formulas  in  Table  6-1,  and 
the  resulting  set  of  formulas  is  shown  to  be  unsatisfiable.  Thus, 
the  set  of  formulas  in  Table  6-1  logically  imply  that  the  monkey 
can  get  the  bananas. 

Simply  proving  that  the  monkey  can  get  the  bananas  is  not,  of 
course,  the  same  thing  as  showing  a  way  for  him  to  do  it.  We  would 
like  our  proof  of  the  existentially  quantified  formula  6-1  to  be  construc¬ 
tive;  that  is,  we  would  like  it  to  produce  ain  actual  sequence  of  operations 
which,  when  applied  to  So,  will  produce  a  state  ^  for  which  “has-ba- 

TABLE  6-1  A.  The  Monkey  and  Bananas  Problem  (Predicate-calculus 

Axioms)* 


Al.  V/?Vp'Vj(at(box,/?,5’)->at(box,;7,goto(p',5) )) 

A2.  V/?VpVj(at(bananas,/?,j)^  at(bananas,/7,goto(p',j) ) ) 

A3.  VpV5(at(monkey,p,goto(/7,>y))) 

A4.  VpV/7'Vj( A  (at(box,/?,j),at(mon,/?,j) )-» at(box,p' move(mon,box,p,p',5) ) ) 
AS.  V/?V/?'V/?"Vj(at(ban,/?,j)-»at(ban,/?,niove(inon,box^p',p",5)  )  ) 

A6.  VpVp'Vj(at(mon,p,5)-->at(mon,p',move(nion,box,p,p',^) ) ) 

Al.  VjCunderCboXjban,^)-^  under  (box, ban,climb(inon, box,  j))) 

A8.  VpVj( A  (at(mon,p,.s),at(box,p,j)  )“^on(mon,box,climb(mon,box,5) ) ) 

A9.  Vj(  A  (under (box,ban,j), on (mon,box,5)  )-»has-bananas(reachfor,mon, 
ban,j) ) ) 

AlO.  Vj(  a  (at(box,/J3,5),at(ban,p3,j)  under  (box, ban, j) ) 

Al  1.  A  (at(box,p2,jo),at(ban,p3,  So)) 


(mon  =  monkey,  ban  =  bananas) 
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TABLE  6-1 B.  Monkey  and  Bananas  (Clause  Form) 


Al.  {“lat(box,p,5),at(box,p,goto(p',5))} 

A2.  {Hat  (bananas, at  (bananas,^, goto  (^',^))} 

A3,  {at (monkey, r, goto (rX))} 

A4.  {'Tat(box,tt,v),”lat(mon,M,v),at(box,M',move(mon,box,M,M',v) )} 

AS.  {nat(ban,Lr"),at(ban,/,move(mon,box,?',y",/"))} 

A6.  {'*lat(mon,v',v'") ,at(mon,v",move(mon,box,v',v'V") ) } 

A7.  {“lunder(box, ban, w), under  (box, ban, climb(mon,box,w))} 

A8.  0at(mon,>v',w")nat(box,w',>v'050n(mon,box,climb(mon,box,w"))} 

A9.  {nunder(box,ban,x),“lon(mon,box,jc),has“bananas(reachfor(mon,ban,x) )} 
AlO.  {“Iat(box,p3,y),~iat(ban,p3,y),under(box,ban,y)} 

All.  {at(bbx,p2,o)} 

A 12.  {at(bananas,/723,o)} 

Negated  Conjecture  (NC):  (“Ihas^bananas  (z)} 

Consequences  of  the  Axioms  (Fig.  6-6) 

Cl.  {at(box,/72,goto(/>',5o))} 

C2.  {“lat (mon,/?2,goto ( i/,so) ) ,at (box,M',move (mon,box,p2,M',goto ( p',So )))} 

C3.  {at(box,M',move(mon,box,p2,M',goto(p2,5‘o) ) )} 

C4.  {~lat(ban,/?3,move(mon,box,/?2p3,goto(p2Jo) ) ) , under  (box, ban, move  (mon, box, 

p2,p3,goto(p2So)))} 

C5.  {at(bananas,p3,goto(^',j'0 )} 

C6.  {at(ban,p3,move(mon,box,^'/",goto(^',j’o)))} 

C7.  {under(box,ban,move(mon,box,p2,F3,goto(p2,>So)))} 

C8,  (under (box,ban,climb (mon, box, move (mon,box, p2,p3, goto (p2,^o) )  ) )} 

C9.  (at  (mon,  v", move  (mon, box,r,v", goto  (r, r') ) )} 

C 1 0.  {"lat  (box,  v", move  ( mon, box, r,  v" , go  to  ( ) ) )  ,on  ( mon, box, climb,  ( mon, box, 
move(mon,box,r,r",goto(r,r') ) ) ) } 

Cll.  {on(mon,box,climb(mon,box,move(mon,box,p2,M',goto(p2,^o) ) ) )} 

C12.  {non(mon,box,climb(mon,box,move(mon,box,p2,p3,goto(p2,jo) ) ) )  ,has-ba- 
nanas  (reachf  or  (mon,ban, climb  (mon, box, move  (mon,box,p2,p3,goto 

(p2,.yo))))} 

C 1 3 .  (has-bananas  (reachfor  (mon,ban, climb  (mon,box,move  (mon,box,p2,p3,goto 
(p^M))))} 


nanas(s)”  will  be  true.  Green  (1969b)  was  the  first  to  devise  a  res¬ 
olution-based  theorem  prover  capable  of  supplying  constructive  proofs 
for  existentially  quantified  formulas. 


Path  Finding,  Example  Generation,  Constructive 
Proofs,  Answer  Extraction 

This  section  presents  Luckham  and  Nilsson’s  (1971)  generaliza¬ 
tion  of  Green’s  technique.  This  technique  is  illustrated  by  a  simple 
problem,  and  then  its  application  to  the  monkey-and-bananas  problem 
is  described. 
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oil 

Figure  6-6.  A  proof  that  the  monkey  can  get  the  bananas. 

(See  Table  6-1.) 

Let  us  suppose  we  are  given  a  simple  set  Sa,  which  contains  only 
the  axiom 

A  ( ( )  =>  y/uP(b,u,u) ) ,  V  (  V>v3rP(a,>v,r) ,  P(bAb) ) ) 

(6-2) 

and  let  us  suppose  we  are  asked  to  prove  the  conjecture 

3  Wy3zP  (x,y,z )  (  6-3 ) 

in  a  constructive  fashion;  that  is,  we  wish  to  find  values  for  the  variables 
x,y,z  such  that  formula  6-3  will  be  true.  Our  standard  procedure  is  to 
convert  formula  6-2  and  the  negation  of  formula  6-3  into  clause-form 
expressions  and  attempt  to  show  that  the  set  S  that  contains  them  is 
unsatisfiable.  Figure  6-7a  shows  the  use  of  the  resolution  principle 
to  construct  a  refutation  graph  and  demonstrate  the  unsatisfiability  of 
5.  (Such  a  refutation  graph  could  be  easily  obtained  by  a  current  reso¬ 
lution-based  theorem  prover.) 
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{nP(b,u,u),P(b,v,v)}  {P(a,w,g(w)),P(b,b,b)} 


{P(b,v,v},P(a,w,g(w)) } 


(B) 

Figure  6-7.  A  simple  refutation  graph  (A)  and  its  modification  (B)  to 
produce  an  example. 

Green’s  technique  provides  a  way  to  construct  the  required  ex¬ 
ample  of  formula  6-3,  using  a  refutation  graph  such  as  Fig.  6-7a.  Once 
a  theorem  prover  has  derived  a  refutation  graph  that  proves  the  initial 
conjecture  (disproves  the  negated  conjecture),  an  “example-construct- 
ing  program”  can  be  used,  which  does  the  following  things : 
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1.  First,  for  each  application  of  resolution  in  the  graph,  the 
literals  that  are  unified  by  that  resolution  are  marked.  In 
Fig.  6-2  these  literals  are  underlined. 

2.  New  variables  are  substituted  for  any  Skolem  functions  oc¬ 
curring  in  the  clauses  coming  from  the  negation  of  the  con¬ 
jecture.  Thus,  the  variable  s  is  substituted  for  the  Skolem 
function  f  (x), 

3.  Any  clause  in  the  graph  which  comes  directly  from  the 
negation  of  the  conjecture  is  converted  into  a  tautology  by 
adding  its  own  negation  to  it.  Thus,  the  clause 

OP{x,s,zn 

is  converted  into  the  clause 
C^P(x,s,z),P(x,s,z)} 

4.  Following  the  structure  of  the  original  refutation  graph,  a 
modified  graph  is  constructed.  Each  resolution  in  the  modi¬ 
fied  graph  unifies  the  same  literals  as  are  unified  in  the 
original  refutation  graph  (i.e.,  the  marked  literals).  If  a 
clause  being  resolved  contains  “tautology  literals”-  added  to 
it  in  step  3,  the  variables  in  the  tautology  literals  receive  the 
same  instantiations  as  they  do  elsewhere  in  that  resolution. 

5.  The  clause  at  the  bottom  of  the  modified  graph  is  an  example 
of  values  for  the  existentially  and  universally  quantified 
variables  occurring  in  the  conjecture,  for  which  the  con¬ 
jecture  will  be  true. 

Figure  6-7b  shows  the  modified  graph  obtained  from  Fig.  6-7a.  The 
clause  at  the  bottom  node  is  equivalent  to 

VyV(PKy,g(y)),P(h,y,y)) 

where  g  is  a  Skolem  function  introduced  during  the  construction  of  the 
refutation  tree.  We  may  interpret  this  clause  as  saying,  “Either  x  =  a, 
y  =  y  (i.e.,  y  has  any  value),  z  -  g(y),  or  x  -  b,  y  =  y,  z  =  y  (i.e.,  y 
has  any  value  and  z  must  equal  y)  will  make  the  conjecture  true,  and  at 
least  one  of  these  two  cases  must  be  a  valid  example  for  any  model  of 
the  axioms.”  The  presence  of  a  Skolem  function  indicates  that  our 
solution  is,  to  some  extent,  general;  there  are  many  models  for  formulas 
6-2  and  6-3,  and  each  model  contains  its  own  set  of  values  for  x,y,z 
which  will  satisfy  the  quantifiers  in  formula  6-3.  The  Skolem  function 
indicates  that  the  values  of  certain  variables  depehd  on  the  particular 
model  and  on  the  values  for  other  variables  one  happens  to  choose. 


252 


INTRODUCTION  TO  ARTIFICIAL  INTELLIGENCE 


(See  Figure  6-7) 


box,move(mon,box,P2rP3,goto(  P2»So) ) ) ) ) } 


{has-bananas(reachfor(mon,ban,climb(mon, 
box,move(mon,box,p2,P3,goto  (p2,sq)  ) ) ) ) } 

Figure  6“8.  Modifying  the  refutation  tree  for  the  monkey-and-bananas 
state-space.  In  this  case  only  the  bottom  part  of  the  tree  is  affected  by 
the  modification  process. 

The  generality  of  the  examples  constructed  by  this  technique  de¬ 
pends  on  the  refutation  graph  it  is  given.  Often  there  will  be  many  ways 
to  prove  that  a  given  set  of  clauses  is  unsatisfiable,  and  different  ways 
will  yield  different  examples.  Because  of  the  undecidability  of  the  predi¬ 
cate  calculus,  there  usually  is  no  way  to  guarantee  that  a  given  example 
is  the  most  general. 

We  can  prove  the  validity  of  this  example-constructing  technique 
by  observing  that  the  modified  graph  represents  the  inference  of  the 
example  from  the  axiom  6-2  and  a  tautology,  consisting  of  formula 
6-3  and  its  negation.  Since  the  inference  itself  is  valid  (i.e.,  the  reso¬ 
lution  principle  is  valid  and  has  been  applied  correctly),  and  since  a 
tautology  is  always  true,  the  example  that  is  constructed  must  be  correct. 

Figure  6-8  shows  the  application  of  Green’s  example-constructing 
technique  to  the  Monkey-and-B ananas  Problem,  modifying  the  refuta¬ 
tion  graph  shown  in  Fig.  6—6.  To  get  his  bananas,  the  monkey  should 
interpret  the  expression  shown  at  the  bottom  of  the  graph,  working  out¬ 
ward  from  the  innermost  subexpression  “goto(place2,^o).”  Thus,  he 
should  perform  the  following  sequence  of  actions: 

goto  place2 

move  monkey,  box,  from  place2  to  place3 
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climb  monkey,  box 

reachjor  monkey,  bananas 

Green  (1969a)  used  his  constructive-proof  technique  to  obtain  a  similar 
solution  for  the  monkey  (note  6-3). 

To  summarize,  the  use  of  resolution-based  theorem  provers  to 
solve  state-space  problems  involves 

1.  Describing  state-spaces  by  means  of  sets  of  predicate  calculus 
formulas. 

2.  Expressing  problems  as  conjectures  to  be  proved. 

3.  Finding  proofs  for  conjectures,  using  the  resolution  principle 
and  various  heuristic  search  strategies  (refinement  methods, 
etc.). 

4.  Using  Green’s  technique  to  convert  a  resolution  proof  of 
the  unsatisfiability  of  the  negated  conjecture  into  an  example 
of  the  conjecture’s  truth. 

5.  Interpreting  this  example  as  a  description  of  the  solution  to 
the  state-space  problem. 

The  Exercises  at  the  end  of  this  chapter  show  how  this  process  can  be 
applied  to  other  state-space  problems  (besides  the  Monkey-and-B  ananas 
Problem)  that  were  discussed  in  Chapter  3. 

Applications  to  Real-World  Problems 

The  extent  to  which  theorem  provers  can  be  used  for  solving  real- 
world  problems  depends  on  several  factors,  including  how  well  predi¬ 
cate  calculus  can  be  used  to  describe  real-world  situations  and  actions, 
and  how  efficiently  theorem  provers  can  be  used  to  find  solutions  to 
problems  that  are  given  predicate  calculus  formalizations.  This  section 
concludes  with  a  limited  discussion  of  these  factors.  The  reader  is  en¬ 
couraged  to  see  Green  (1969a),  McCarthy  and  Hayes  (1968),  and 
Hewitt  (1969,1970)  (discussed  in  the  next  section)  for  more  on  this 
subject. 

Since,  presumably,  any  mathematical  theory  can  be  expressed  as 
a  system  of  predicate  calculus  formulas,  there  is  no  doubt  that  predicate 
calculus  offers  a  metaphysically  adequate  mathematical  framework  for 
the  description  of  the  real  world,  if  any  such  framework  can  be  con¬ 
structed  at  all.  Our  major  questions  must  concern  its  epistemological 
adequacy  (how  well  it  can  represent  everyday  aspects  of  the  real  world) 
and  its  heuristic  adequacy  (how  well  it  can  be  used  to  express  informa¬ 
tion  that  is  helpful  in  solving  problems) . 
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The  epistemological  adequacy  of  predicate  calculus  is  probably 
satisfactory  for  the  construction  of  real-world  problem  solvers.  Green, 
McCarthy,  and  Hayes  have  shown  that  predicate  calculus  can  be  used 
to  provide  formalizations  for  such  aspects  of  the  real  world  as  time- 
dependency,  causality,  and  ability.  The  Monkey-and-Bananas  Problem 
illustrates  that  predicate  calculus  can  be  used  to  formally  express  con¬ 
cepts  involving  objects  and  spatial  relations.  Other  examples  can  be 
provided  (see  the  Exercises  and  the  next  section)  to  show  that  predicate 
calculus  may  be  used  to  represent  problems  that  have  solutions  which 
are  disjunctive,  conditional,  and  which  contain  loops  or  recursive  defini¬ 
tions.  Perhaps  the  major  questions  involve  the  desirability  of  using 
modal  and  many-valued  logics  instead  of  predicate  calculus,  and  the 
question  of  whether  higher  than  first-order  predicate  calculus  can  be 
used  successfully. 

There  are  strong  intuitive  reasons  for  suspecting  that  many-valued 
logics  are  more  desirable  than  predicate  calculus.  A  machine  capable 
of  solving  problems  in  a  real-world  environment  must  have  some  way 
of  dealing  with  ambiguities,  inaccuracies,  probabilities,  multiple  inter¬ 
pretations,  etc.  Chapter  3  presented  a  list  of  some  aspects  of  the  real 
world  which  should  be  easily  representable  in  the  reasoning-language 
used  by  such  a  machine.  Again,  it  is  clear  that  each  of  these  aspects  of 
the  real  world  could  be  embodied  in  a  predicate  calculus  machine  if 
they  can  be  embodied  in  any  machine  at  all.  However,  any  such  em¬ 
bodiment  in  a  predicate  calculus  machine  would  require  a  set  of  axioms 
to  define  the  functions  and  predicates  that  were  associated  with  each  of 
these  aspects.  The  question  is  whether  some  other  logic,  which  had 
these  axioms  built  into  its  logic  symbols  and  inference  rules,  would  be 
more  efficient.  This  question  must  be  considered  in  light  of  the  fact 
that  no  completely  satisfactory  many-valued  logic  has  yet  been  de¬ 
veloped.  Perhaps  it  will  be  necessary  to  develop  a  system  with  a  variable¬ 
valued  logic,  one  that  would  be  able  to  learn  various  functions  and 
predicates  and  build  the  most  useful  ones  into  its  logical  apparatus. 

As  for  the  use  of  higher  than  first-order  predicate  calculus,  essen¬ 
tially  the  same  arguments  apply.  First-order  predicate  calculus  is  episte¬ 
mologically  adequate,  but  it  seems  likely  that  a  higher-order  system 
would  be  much  more  efficient.  Hewitt  (1969,1971)  developed  a  theo¬ 
rem-proving  language  (described  in  the  next  section)  that  in  many 
respects  is  more  powerful  than  omega-order  predicate  calculus. 

Hewitt’s  work  was  also  concerned  with  the  heuristic  adequacy  of 
predicate  calculus.  He  showed  that  it  is  possible  not  only  to  use  predi¬ 
cate  calculus  formulas  as  statements  of  facts,  but  also  to  use  them  as 
recommendations  for  how  to  proceed  in  solving  problems. 


Theorem  proving 


255 


The  other  major  question  concerns  how  efficiently  theorem  provers 
can  be  used  to  find  solutions  to  problems  that  are  given  predicate  calcu¬ 
lus  formalizations.  In  considering  this  question,  it  is  well  to  point  out 
that  Green’s  technique  and  resolution-based  theorem  provers  have  so 
far  been  applied  only  within  given  state-space  problems.  The  issue  of 
whether  theorem-proving  techniques  can  be  used  to  logically  analyze  a 
given  state-space  problem,  and  show  that  its  solutions  are  equivalent  to 
those  for  a  problem  with  a  simpler  description  (smaller  state  space  or 
set  of  operators),  remains  undecided, 

Another  problem  in  the  efficient  use  of  theorem  provers  is  the 
jrame  problem,  in  McCarthy  and  Hayes  (1968).  The  frame 

problem  arises  from  the  fact  that,  in  a  state-space  problem,  an  applica¬ 
tion  of  an  operator  to  a  state  will  usually  affect  some  relations  between 
objects  in  the  state  and  not  affect  others.  In  the  predicate  calculus 
formalization  for  such  a  problem,  there  must  generally  be  axioms  for 
each  operator  to  express  both  the  relations  that  are  and  are  not  changed 
by  the  application  of  that  operator.  For  example,  in  the  Monkey-and- 
Bananas  Problem  we  had  to  state  and  use  the  fact  that  the  application 
of  the  operator  climb  would  not  affect  the  position  of  the  box  (see 
Table  6-1  A).  Whenever  it  is  necessary  to  make  use  of  the  fact  that  a 
certain  relation  still  holds  in  a  given  state,  the  theorem  prover  must 
prove  it,  using  the  axioms  for  each  of  the  operators  that  have  been  ap¬ 
plied  since  the  relation  was  last  shown  to  be  true.  This,  of  course,  greatly 
increases  the  work  that  must  be  done  by  the  theorem  prover. 

Various  techniques  for  overcoming  the  frame  problem  have  been 
investigated,  notably  by  Fikes  and  Nilsson  (1971)  and  Hewitt  (1969, 
1971).  Fikes  and  Nilsson  present  a  cps-like  program  that  controls  the 
application  of  a  theorem-proving  program  to  various  sets  Si  of  clauses, 
each  set  Si  representing  a  given  state  in  a  state  space.  Each  operator  has 
associated  with  it  a  collection  of  “delete”  and  “add”  instructions  that 
identify  the  relations  changed  by  the  application  of  that  operator.  The 
program  (called  strips)  performs  a  heuristic  search  in  the  state  space 
until  it  finds  a  sequence  of  operators  that  will  produce  a  set  Sg  of  clauses 
containing  the  desired  relations.  Figure  6-8  shows  strips  solving  an 
expanded  problem  of  the  Monkey-and-Bananas  type.  STRIPS  controls 
a  robot  (Shakey),  which  performs  tasks  in  a  real-world  environment, 
as  indicated  by  Fig.  6-9.  (Also  see  Chapter  9.) 

Hewitt’s  approach  to  the  frame  problem  was  similar.  He  defined 
a  general  class  of  procedures  for  manipulating  data  bases  (sets  of  ex¬ 
pressions)  that  include  the  Si  sets  of  Fikes  and  Nilsson.  This  approach 
is  discussed  in  the  next  section. 
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ROOMS 


Tasks 

1.  Turn  on  the  lishtswitch 

Goalwff:  STATUS{LIGHTSWITCH1,ON) 

STRIPS  solution:  {96to2(BOX1),clinibonbox(BOX1).cHmboffbox(BOXl), 

pushto  ( BOX1 .  LI  G  H  TSWI TCH 1 )  ,climbonbox  ( BOX1  Ktumon  light  ( L I G  H  TSWITCH 1 }  } 

2.  Push  three  boxes  together 

Goal  wff:  NEXTTO(BOX1,BOX2)  ANEXTTO|BOX2,BOX3) 

STRIPS  solution:  {goto2(BOX1},pushto(BOX1,BOX2),goto2(BOX3).pushto(BOX3,BOX2)  } 

3.  Go  to  a  location  in  another  room 

Goalwff:  ATROBOTIf) 

STRIPS  solution:  {goto2(DOOR1),gothrudoor<DOOR1,ROOMT.ROOM5), 

goto2lDOOR4),gothrudoor(DOOR4,ROOM5,ROOM4),goto1(f)  } 

Figure  6-9.  Tasks  for  STRIPS  (initially  at  position  e)  and  its  solutions. 
(Fikes  and  Nilsson,  1971,  reprinted  with  permission.) 
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THEOREM  PROVING  IN  PLANNING  AND 
AUTOMATIC  PROGRAMMING 

Planning 

Chapter  3  stressed  the  value  of  planning  as  a  process  to  be  per¬ 
formed  by  a  general  problem  solver;  for  many  real-world  problems  it  is 
impossible  to  specify  a  single  sequence  of  operations  that  will  invariably 
achieve  one’s  goal.  The  first  and  most  general  solution  is  a  plan.  Plans 
typically  describe  many  alternate  sequences  of  actions  and  specify  con¬ 
ditions  according  to  which  different  sequences  will  be  followed.  “Plan¬ 
ning”  (i.e.,  developing  a  plan)  is  itself  one  of  the  actions  that  a  plan 
might  dictate,  and  can  be  considered  as  an  aspect  of  problem  reductioa 
The  feasibility  of  using  program-like  structures,  such  as  nondeterministic 
programs  and  fuzzy  algorithms,  to  represent  plans,  has  been  mentioned. 

The  use  of  theorem-proving  techniques  in  planning  is  still  at  the 
stage  of  preliminary  investigation.  The  few  results  that  have  so  far  been 
achieved  indicate  that  it  may  be  possible  to  use  theorem  provers  to 
construct  plans  for  the  solution  of  real-world  problems.  Current  investi¬ 
gations  have  followed  essentially  two  approaches  (note  6-4)  to  the 
development  of  theorem-proving  plan  makers:  Hewitt  (1969,1971) 
developed  the  programming  language  planner,  which  permits  the 
statement  and  execution  of  plans  in  a  theorem-proving  format;  other 
researchers  (e.g.,  Green,  1969a;  Waldinger  and  Lee,  1969;  McCarthy, 
1962;  R.  W.  Floyd,  1967;  Manna,  1969,1970)  demonstrated  that  it 
is  possible  to  use  resolution-based  theorem  provers  to  develop  computer 
programs,  and  that  it  is  often  possible  for  people  to  prove  whether  or 
not  a  given  computer  program  is  “correct,” 

Planner 

A  programming  language  is  a  way  of  describing  procedures  to 
computers;  a  description  of  a  procedure,  written  in  a  programming  lan¬ 
guage,  is  a  program.  Computers  with  a  given  “language  capability” 
can  accept  programs  written  in  that  language  and  carry  out  the  pro¬ 
cedures  they  describe.  Up  to  now  it  has  not  been  necessary  to  consider 
any  specific  programming  languages,  since  these  discussions  have  been 
more  concerned  with  procedures  than  programs.  Thus,  “procedure” 
and  “program”  have  been  used  somewhat  interchangeably.^^  Theoreti- 

The  text  has  also  been  somewhat  informal  on  this  point  in  other  respects. 
Thus,  programs  have  been  often  said  to  “do”  something  or  to  “perform”  some 
task  when  in  actuality  it  is  the  computer  that  does  or  performs  the  procedure  de¬ 
scribed  by  the  program.  This  informality  will  be  maintained. 
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cally,  any  procedure  that  can  be  performed  by  a  program  written  in 
one  programming  language  can  be  performed  by  some  program  in  any 
other  given  programming  language.  The  difference  between  program¬ 
ming  languages  lies  in  the  simplicity  with  which  various  procedures 
can  be  stated  by  their  programs.  Thus,  associative  data  processing  is 
harder  to  do  in  fortran  than  it  is  in  sail  (a  programming  language 
used  at  the  Stanford  Artificial  Intelligence  Project — see  Feldman  et  al., 
1972). 

PLANNER  is  a  significant  new  programming  language  for  artificial 
intelligence  research  (see  Hewitt,  1968  et  seq.).  Some  of  the  things  that 
can  be  done  easily  with  planner  will  be  listed,  though  space  does  not 
allow  more  than  a  brief  presentation  of  the  language  itself.  Hewitt’s 
work  founded  a  new  genus  of  programming  languages  for  ai  research. 
Among  these  are  qa4  (Rulifson,  1971),  conniver  (Sussman  and  Mc¬ 
Dermott,  1972a, b),  and  sail  (Feldman  et  al.,  1972).  PLANNER  is 
still  in  the  process  of  being  implemented;  however,  an  early  version  of 
PLANNER  (micro-planner — see  Sussman  and  Winograd,  1970;  Baum- 
gart,  1972)  has  been  operational  for  more  than  a  year.  LISP  is  a  very 
desirable  background  to  these  languages,  and  we  also  suggest  reference 
to  McCarthy  et  al  (1965),  Weissman  (1967),  and  Teitelman  et  al. 
(1972). 

PLANNER  is  a  programming  language  for  the  manipulation  of 
data  bases.  A  data  base  is  some  set  of  expressions  which  a  planner 
program  may  treat  as  assertions  of  knowledge  about  the  world.  A  pro¬ 
gram  written  in  planner  is  a  description  of  a  plan  for  changing  the 
assertions  in  a  data  base  (or  perhaps  creating  a  new  data  base)  de¬ 
pending  on  the  assertions  that  are  already  in  the  data  base.  The  funda¬ 
mental  mechanism  that  makes  planner  work  is  matching  (see 

Chapter  5):  a  planner  program  (or  “theorem”)  may  use  pattern 
matching  to  search  a  data  base  for  certain  expressions  and,  if  it  finds 
them,  add  a  new  expression  to  the  data  base.  Or  a  planner  program 
may  use  pattern  matching  to  search  the  data  base  for  other  programs 
(theorems)  that  are  designed  to  add  certain  expressions  to  a  data  base. 
Thus,  the  planner  “consequent”  theorem: 

(THCONSE  (X)  (FALLIBLE  $?X) 

(THGOAL  (HUMAN  $?X) ) ) 

is  a  program  specifying  a  procedure  to  follow  in  order  td  add  an  asser¬ 
tion  of  the  form  (fallible  $?x)  to  a  data  base;  this  procedure  consists 
of  attempting  to  satisfy  the  statement  (thgoal  (human  $?x)),  which 
can  be  done  if  the  pattern  matcher  finds  an  assertion  of  the  form  (hu- 
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MAN  $?x)  in  the  data  base,  or  if  the  pattern  matcher  finds  another  con¬ 
sequent  theorem  in  the  data  base,  of  the  form 

(THCONSE  (Y)  (HUMAN  $?Y)  . . .  ) 

and  this  theorem  can  be  successfully  executed  with  Y  bound  to  the 
value  of  X.  If  the  thgoal  statement  can  be  satisfied  in  either  of  these 
ways,  and  the  value  of  X  is,  say,  socrates,  then  the  original  thconse 
theorem  will  add  the  assertion  (fallible  socrates)  to  the  data  base. 
In  English,  this  means  “something  is  fallible  if  it  is  already  known  to  be 
human,  or  if  it  can  be  shown  to  be  human.”  (The  $  and  ?  are  variable 
prefixes  used  for  pattern  matching- — see  Chapter  5.) 

Similarly,  planner  makes  use  of  “antecedent”  theorems  to  change 
assertions  in  a  data  base  automatically  whenever  certain  other  assertions 
are  added  or  erased.  Thus, 

(THANTE  (X  Z)  (LIKES  $?X  $?Z) 

(THASSERT  (HUMAN  $?X))) 

is  an  antecedent  theorem  (program)  which  states  that  whenever  an 
assertion  of  the  form  (likes  $?x  $?z)  is  added  to  a  data  base,  the  as¬ 
sertion  (human  $?x)  should  also  be  added. 

Thus,  a  PLANNER  theorem  is  capable  of  acting  as  a  goal-oriented, 
nondeterministic  program;  it  can  stipulate  various  goals  for  the  computer 
without  stipulating  exactly  how  the  computer  must  try  to  achieve  them. 
PLANNER  theorems  are  an  example  of  pattern-directed  plans. 

Furthermore,  planner  included  the  ability  to  backup  a  plan  if  a 
pattern  matching  proves  to  be  unsuccessful.  Thus,  suppose  our  data 
base  includes  the  following  simple  assertions  : 

(HUMAN  TURING) 

(HUMAN  SOCRATES) 

(GREEK  SOCRATES) 

and  the  theorem: 

(THCONSE  (X)  (FALLIBLE  $?X) 

(THGOAL  (HUMAN  $?X))) 

We  can  search  the  data  base  to  answer  the  question  “Is  there  a  fallible 
Greek?”  by  evaluating  the  PLANNER  program: 

(THPROG  (X)  (THGOAL  (FALLIBLE  $?X)  (THTBF 

THTRUE)) 
(THGOAL  (GREEK  $?X))) 

This  expression  will  have  a  successful  execution,  and  return  a  value 
for  x,  only  if  both  thgoal’s  can  be  satisfied.  When  planner  attempts 
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to  satisfy  the  first  thgoal,  it  will  make  use  of  the  thconse  theorem  to 
prove  the  validity  of  asserting  that  something  (the  value  matched  to  x) 
is  fallible;  after  that,  planner  will  attempt  to  satisfy  the  second  thgoal, 
by  trying  to  prove  or  find  an  assertion  that  the  same  thing  (the  same 
value  of  x)  is  Greek.  In  satisfying  the  first  thgoal,  evaluation  of  the 
THCONSE  theorem  will  cause  the  pattern  matcher  to  attempt  to  match  the 
pattern  rule  (human  $?x)  against  an  assertion  in  the  data  base.  Sup¬ 
pose  that  the  pattern  matcher  first  matches  this  pattern  rule  with  the 
assertion  (human  Turing),  making  Turing  be  the  value  of  x;  the 
THCONSE  theorem  will  then  add  the  assertion  (fallible  Turing)  to 
the  data  base,  aiid  control  will  return  to  the  thprog  which  will  attempt 
to  satisfy  its  second  thgoal,  by  either  finding,  or  proving  the  validity  of, 
the  assertion  (Greek  Turing).  This  attempt  will  fail,  because  the  as¬ 
sertion  (GREEK  TURING)  docs  not  appear  in  the  data  base  and  there  are 
no  theorems  in  the  data  base  which  could  be  used  to  add  such  an  as¬ 
sertion  to  the  data  base.  The  failure  of  this  thgoal  will  cause  the 
thprog  to  backup,  and  attempt  to  resatisfy  its  first  thgoal  with  a  dif¬ 
ferent  value  for  x;  the  thconse  theorem  will  be  re-executed,  and  its 
thgoal  will  call  upon  the  pattern  matcher  to  once  again  match  the  pat¬ 
tern  rule  (human  $?x)  with  an  assertion  in  the  data  base.  However, 
this  time  the  pattern  rule  will  be  matched  with  the  assertion  (human 
SOCRATES),  and  the  new  value  for  x  will  be  socrates.  The  thconse 
will  succeed,  and  the  assertion  (fallible  socrates)  will  be  added  to 
the  data  base,  and  so  the  first  thgoal  of  the  thprog  will  succeed  again. 
Lastly,  the  thprog  will  again  attempt  to  satisfy  its  second  thgoal. 
This  attempt  will  succeed,  because  the  assertion  (greek  socrates)  will 
be  found  in  the  data  base.  And  so,  the  thprog  itself  will  terminate 
execution  successfully,  and  return  a  value  for  x  that  is  the  answer  to 
our  question. 

This  feature  of  planner  (known  as  iX.%  hierarchical  control  struc-- 
ture)  is  extremely  general.  In  essence,  any  decision  made  during  the 
evaluation  of  a  planner  theorem  can  be  undone,  if  failures  “backup” 
to  the  point  where  it  was  originally  made.  The  generality  of  this  feature 
has  been  criticized  by  some  researchers  (Sussman  and  McDermott, 
1972),  who  claim  that  it  is  often  very  inefficient  to  rely  on  such  a  strict 
depth-first  mechanism,  and  that  such  a  control  structure  is  difficult  for 
human  programmers  to  use  without  confusion,  especially  when  writing 
large  planner  programs.  Recently,  Sussman  and  McDermott  have 
implemented  a  programming  language  known  as  conniver,  which  is 
similar  to  planner  (in  that  programs  are  pattern-directed),  but  which 
attempts  to  provide  a  more  flexible,  explicit  means  of  specifying  the 
backup  one  wants  to  occur.  (Also  see  Bobrow  and  Wegbreit,  1972.) 
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PLANNER  has  many  other  important  features.  The  language  con¬ 
tains  as  “primitive  functions”  many  procedures  that  must  be  written 
out  in  detail  within  other  languages.  Knowledge  in  planner  is  stored 
in  a  generalized  “associative  memory”  (a  graphlike  structure  with 
labeled  nodes  and  arcs;  see  Chapter  7).  Because  planner  programs 
are  themselves  list  structures,  it  is  possible  for  such  programs  to  be 
created,  changed,  or  executed  by  other  (and  in  some  cases  the  same) 
PLANNER  programs.  PLANNER  theorems  are  not  restricted  to  first-order 
predicate  calculus,  nor  do  they  even  necessarily  correspond  very  simply 
to  formulas  in  higher-order  predicate  calculus.  Thus,  a  planner 
theorem  might  state:  “do  not  use  induction  on  the  same  variable  twice,” 
or  “there  exist  R  and  Y  such  that  r(y,turing)  implies  y(turing).” 
Predicates  may  be  quantified  or  included  within  other  predicates. 
planner  is  currently  being  used  as  the  inference  mechanism  for  pro¬ 
grams  that  “understand”  natural  language  (see  Chapter  7)  and  find 
descriptions  of  visual  scenes.  For  the  interested  reader.  Figure  6— 10 
sho^s  a  planner  program  (Orban's  Monkey)  for  solving  the  Monkey- 
and-B  ananas  Problem,  much  of  which  should  be  understandable  from 
the  discussion  thus  far.  The  planner  genus  is  probably  the  most 
natural  set  of  programming  languages  yet  developed  for  the  ultimate 
writing  of  reasoning  programs. 

However,  barring  the  aspect  of  higher-order  predicate  calculus, 
there  is  no  direct  comparison  between  theorem-proving  systems  written 
in  planner  and  the  resolution-based  theorem  provers.  The  purposes 
behind  these  two  approaches  to  theorem  proving  appear  to  be  some¬ 
what  different.  On  the  one  hand,  resolution-based  theorem  provers  are 
designed  to  be  general  and  complete  programs  for  proving  and  dis¬ 
proving  theorems  within  matheniatical  theories.  Though  we  have  dis¬ 
cussed  ways  in  which  they  can  be  designed  to  take  account  of  the 
semantic  content  of  mathematical  theories  (e.g.,  Kling’s  analogy 
generator),  the  primary  accent  in  their  development  has  been  a  con¬ 
centration  on  their  completeness  and  soundness;  that  is,  on  proving 
their  applicability  to  any  mathematical  system  and  increasing  their 
efficiency  as  much  as  possible  without  relinquishing  that  applicability 
(note  6-5). 

planner,  on  the  other  hand,  provides  a  framework  in  which  it  is 
possible  to  write  very  sophisticated  programs  for  special-purpose  types 
of  theorem  proving.  There  are  many  types  of  information  processing 
and  problem  solving  that  involve  logical  deduction,  or  theorem  proving, 
without  requiring  full  completeness  or  generality.  When  the  types  of 
questions,  or  problems,  or  theorems  to  be  proved  can  be  anticipated 
in  advance,  one  can  sometimes  write  a  special-purpose  program  to  deal 
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(SETO  HONKEY  ♦fTH&OAL  fMONKEY  GETS  BANANAS W THTBF  THTRUE > M 
(THASSERT«CLIMBABLE  BOX M 
tTHASSERUBOX  AT  AM 
«  THASSERT  (MONKEY  AT  BM 

(Thassert(Bananas  at  cm 

( THASSERT (MONKEY  OFF  BOX M 
iDEFPROP  REACH  ITHCONSE  iXYZ)  (MONKEY  GETS  BANANA51 
(THASSERT  (MONKEY  WANTS  BANANASM 

(PRINT  ♦(THE  MONKEY  THINKS  HE  WANTS  SOME  BANANASM 
(  THOR 

(THGOAL  (BANANAS  AT  ( THV  XYZ M I 

(THFAIL  THEOREM  ♦(YES.  WE  HAVE  NO  BANANASM  > 

(THASSERT  IMONKEY  AT  (THV  XYZ M ( THPSEUOO 1 ( THTBF  THTRUE M 
(  THOR 
(THAND 

(THGOAL  (MONKEY  AT  (THV  XYZJM 
(THGOAL  (MONKEY  ON  BOX M  I 

(THFAIL  THEOREM  ♦(MONKEY  DION*T  MOVE.  MONKEY  NOT  WELL M  I 
(THERA5E  (MONKEY  WANTS  BANANASM 
(PRINT  ♦(MONKEY  GETS  BANANAS)! 

ITHSUCCEED  THEOREM  ♦SUCCESS) 

(THEOREM) 

(THASSERT  REACH) 

(DEFPROP  MOVEBOX  ( THANTE  (X  Z  0)  (BOX  AT  I THV  XM 
(THGOAL  (BOX  AT  (THV  Z» ) ) 

(THOR 

(THAND  (EQUAL  (THV  X)(THV  ZM 
(THSUCCEEO  THEOREM!) 

T) 

(THGOAL  (MONKEY  AT  (THV  OM) 

(THOR  (THOR 

(THGOAL  (MONKEY  OFF  BOX)! 

I THAND 

ITHNOT  (THGOAL  (MONKEY  ON  BOX))) 

(THASSERT  (MONKEY  OFF  BOX)))  ) 

( THAND 

(THERASE  (MONKEY  ON  BOX)) 

(THASSERT  (MONKEY  OFF  BOX M 

(PRINT  ♦(MONKEY  NOTICES  HE  IS  ON  THE  BOX)) 

(PRINT  ♦(MONKEY  GETS  OFF  THE  BOX!))  ) 

(THOR  (EQUAL  (THV  0))THV  Z)! 

(THASSERT (MONKEY  AT (THV  Z ) M THPSEUDO M THTBF  THTRUE))) 
(THERASE  (BOX  AT  (THV  Z) M 
(THASSERT  (BOX  AT  (THV  X)M 
(THERASE  (MONKEY  AT  (THV  Z))) 

(THASSERT  (MONKEY  AT  (THV  X))) 

(PRINT  fLlST  ♦MONKEY  ♦MOVES  ♦BOX  ♦FROM  (THV  Z)  ♦TO  (THV  XI )) 
(THSUCCEED  THEOREM! 

) THEOREM) 

(THASSERT  MOVEBOX) 


Figure  6-10.  Orban’s  Monkey.  (Written  by  Richard  Orban;  published  as 
an  example  in  Baumgart,  1972.  Reprinted  with  permission.) 
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(DCfPflOP  CL  I  KB  flHANTE  «X  Y  2  W  5  0»  (MOWJ^EY  AT  I THV  Xn 
<  T«GOAL  ^MOMKEY  AT  (  THV  OMI 
rp«INT  (LJ5T  «-nONKEY  *\S  •'AT  i  THV  Q  l  I  i 

<THCOND  iiTHGOAL  < MONKEY  WANTS  < THV  Y>H  l THGO  Bl)<T  T>» 

A  i THAND  I  THOR 

rTH&OAL  I  MONKEY  OFF  BOX )  1 
(THAND 

(THERA5E.  (MONKEY  ON  BOX)) 

(THASSERT  (MONKEY  OFF  BOX)) 

(PRINT  •'(MONKEY^  NOTICES  HE  IS  ON  THE  BOX)) 

(PRINT  ♦(MONKEY  CLIMBS  OFF  THE  BOX)))  ) 

(  THOR 

(THGOAL  (MONKEY  AT  (THV  X))) 

( THAND 

(THERA5E  (MONKEY  AT  (THV  0))) 

(THASSERT  (MONKEY  AT  (THV  X>)) 

(PRINT(LIST  ♦MONKEY  ♦GOES  ♦FROM(THV  Q)^TO(THV  X))))) 
(THSUCCEED  THEOREM  ♦SUCCESS)) 

(THFAIL  THEOREM  (PRINT  ♦(WHAT  MONKEY  ?))) 

B  (PRINT  ILIST  ♦THE  ♦MONKEY  ♦WANTS  ♦SOME  (THV  Y))) 

(THGOAL  ((THV  Y)  AT  I THV  S))) 

(PRINT  (LIST  ♦MONKEY  ♦NOTICES  ♦THAT  (THV  Y)  ♦ARE  ♦AT  (THV  S))) 
(THOR  (EQUAL  (THV  X) (THV  S)) 

(THGO  A)  ) 

(  THOR 
(THAND 

(THGOAL  ((THV  W)  AT  (THV  Z)))  , 

(THGOAL  (CLIMBABLE  (THV  W)))  / 

(PRINT  (LIST  ♦MONKEY  ♦NOTICES  ♦A  (THV  W)  ♦AT'  (THV  Z)))  ) 

(THFAIL  THEOREM 

(PRINT  ♦(ALONE  IN  THE  WORLD.  WITH  OUT  A  FRIEND)))  ) 

(  THOR 

(EQUAL  (THV  2)  (THV  S) > 

(THASSERT  ((THV  W>  AT  (THV  S ) ) ( THPSEUDO ) ( THTBF  THTRUE ) ) ) 

(  THOR 

(THGOAL  (MONKEY  AT  (THV  S))) 

(THAND 

(THERASE  (MONKEY  AT  (THV  Q))) 

(THASSERT  (MONKEY  AT  (THV  S))) 

(PRINT  (LIST  ♦MONKEY  ♦GOES  ♦FROM  (THV  Q)  ♦TO  (THV  S))?)  > 

(THAND 
(  THOR 

(THERASE  (MONKEY  OFF  (THV  W))) 

T  ) 

(  THOR 
(THAND 

(THASSERT  (MONKEY  ON  (THV  Wl)) 

(PRINT  (LIST  ♦MONKEY  ♦CLIMBS  ♦ON  (THV  W)))  ) 

(PRINT  ♦(MONKEY  ALREADY  ON  BOX.  BUT  YOU  KNEW  THAT)))  ) 
(THSUCCEED  THEOREM) 

(THEOREM) 

(THASSERT  CLIMB) 


Figure  6-10.  (Continued) 
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only  with  these  questions,  problems,  or  theorems,  according  to  a  pre¬ 
determined  strategy,  planner  provides  a  way  of  writing  special-purpose 
programs  for  theorem  proving  that  make  use  of  predetermined  strat¬ 
egies.  One  of  the  most  valuable  features  of  planner  is  that  these 
predetermined  strategies  can  be,  to  some  extent,  self-developing.  It  is, 
of  course,  possible  to  write  complete,  general  theorem-proving  strat¬ 
egies  in  planner,  but  so  far  its  most  impressive  uses  have  been  of  a 
special-purpose  nature  (e.g.,  the  use  of  planner  and  the  similar  lan¬ 
guage  PROGRAMMAR  in  Wiuograd’s  English-understanding  program). 

It  is  possible  that  the  future  will  see  some  sort  of  hybridization  be¬ 
tween  PLANNER-based  and  resolution-based  theorem  provers.  PLAN¬ 
NER  and  similar  languages  may  eventually  provide  the  notation  for 
designing  reasoning-programs  that  will  process  information  according 
to  special  strategies.  These  strategies  may  specify  conditions  in  which 
general-purpose  theorem  provers,  perhaps  resolution-based,  will  be 
used. 

Automatic  Programming 

Green  (1969a),  and  Waldinger  and  Lee  (1969),  wrote  theorem¬ 
proving  programs  that  write  simple  programs  in  lisp  (see  McCarthy  et 
al.,  1962,  or  Weissman,  1967).  Both  programs  are  based  on  the 
resolution  process  for  theorem  proving.  A  brief  description  of  the 
nature  of  these  programs  is  given  here.  A  complete  discussion  is,  of 
course,  given  in  the  papers  by  the  authors.  Nilsson  (1971,  pp.  201- 
205)  has  also  reviewed  Green’s  results. 

First,  a  few  words  about  lisp  are  probably  necessary.  LISP  is  a 
programming  language  for  writing  programs  that  manipulate  symbolic 
expressions  known  as  list  structures.  A  list  structure  is  a  list  whose 
elements  may  be  lists,  or  lists  of  lists,  etc.  Thus,  a  general  definition  of 
list  structures  is :  “X  is  a  list  structure  if  X  is  an  atowy  or  X  is  an  ordered 
sequence  of  zero  or  more  list  structures.”  An  atom  is  a  string  of  symbols. 
For  example,  a,  abc,  jmc  are  all  atoms.  A  list  structure  is  usually  de¬ 
noted  by  a  pair  of  parentheses  enclosing  the  sequence  of  its  elements. 
Thus,  (a  (a  be)  (abc  (jmc) ) )  is  a  list  structure.  The  empty  list,  which 
does  not  contain  any  elements,  is  denoted  by  (  )  or  by  the  atom  nil. 
List  structures  may  be  reentrant;  that  is,  they  may  contain  themselves 
as  elements.  Thus,  X  —  (a  X)  (a  (a  X))  =  •  •  •  is  a  list  structure. 
LISP  provides  a  collection  of  primitive  functions  for  manipulating  list 
structures.  These  functions  can  be  used  to  make  more  complex  pro¬ 
grams.  Finally,  a  program  written  in  lisp  is  itself  a  list  structure.  Thus, 


Theorem  proving 


265 


programs  in  lisp  can  be  designed  to  create  and  manipulate  other  pro¬ 
grams. 

However,  the  ability  to  write  programs  that  create  programs  is  a 
solution  to  only  the  simplest  problem  of  automatic  programming.  In 
general,  what  we  desire  are  programs  that  create  correct  programs, 
where  “correctness”  is  determined  by  a  program’s  ability  to  compute  a 
given  function.  That  is,  we  desire  a  “program-writing  program”  P, 
which,  when  given  a  description  of  a  function  /,  produces  a  program 
F  that  computes  the  value  of  j{x)  for  any  given  value  of  x  for  which 
f{x)  is  defined. 

There  are  many  ways  of  describing  functions.  Programs  are 
themselves  descriptions  of  functions,  and  it  would,  of  course,  be  trivial 
to  write  a  program  P  that  could  write  programs  if  the  descriptions  of 
functions  given  to  it  were  already  in  program  form.  Usually,  we  must 
assume  that  P  is  given  some  less  explicit  description  of  the  function  / 
than  an  actual  program  for  /. 

Probably  the  least  explicit  way  of  describing  a  function  /  is  to 
specify  a  predicate,  say  R{x,y)^  such  that  P(A:,y)  is  true  if  and  only 
if  j{x)  is  defined  and  equals  y.  It  is  often  possible  to  specify  such  a 
predicate  P  associated  with  a  function  /  without  specifying  a  program 
that  competes  /.  Indeed,  it  is  often  much  easier  to  specify  predicates 
than  programs.  For  example,  suppose  that  x  and  y  are  variables  that 
may  have  as  their  values  any  finite  sequences  of  natural  numbers.  Thus, 
X  might  equal  (4  1  3)  and  y  might  equal  (10  11  91 ).  We  say  a  sequence 
X  is  sorted  iff  the  elements  of  x  are  arranged  in  ascending  numerical 
order Thus,  x  =  (4  1  3)  is  not  sorted,  whereas  y  =  (10  1191)  is 
sorted.  Given  this  notion  of  “sorted,”  we  can  give  the  following  descrip¬ 
tion  of  a  function  sort:  For  any  sequence  x,  sort{x)  is  a  sequence  con¬ 
taining  the  same  elements  as  x,  and  sort(x)  is  sorted.  Thus,  sort 
((4  13))  would  be  (13  4).  However,  this  description  of  the  function 
“sort”  does  not  specify  a  procedure  for  computing  the  function.  It 
merely  specifies  a  relation, holding  between  x  and  sort(x);  they  must 
both  contain  the  same  elements  and  sort  (x)  must  be  sorted — ^in  othbr 
words,  a  test  we  can  apply  to  any  proposed  procedure  to  see  if  it  does 
indeed  compute  the  function  “sort”  (the  test  is:  “choose  any  sequence 
x;  if  the  procedure  when  applied  to  x  does  not  produce  a  sequence  y 
such  that  y  contains  the  same  elements  as  x  and  y  is  sorted,  then  the 
procedure  fails  the  test”).  The  description  here  gives  no  explicit  in- 

We  can  define  the  property  “sorted”  by  using  three  relations,  “equals,” 
“left  of”  and  “less  than”:  A  sequence  x  is  sorted  if  for  any  elements  ^  and  e' 
of  X,  e  left  of  e'  implies  e  less  than  e'  or  e  equals  e*i 
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formation  as  to  how  one  should  go  about  producing  sort(:^:)  if  one  is 
given  an  arbitrary  sequence  x. 

Given  a  predicate  which  describes  a  function  f,  Green 

(1969)  identified  four  basic  problems  for  the  field  of  automatic 
programming,  such  that  each  problem  can  be  stated  using  a  first-order 
predicate  calculus  formula  and  such  that  each  problem  can  be  solved^® 
by  a  theorem  prover  that  has  the  ability  to  provide  constructive  proofs 
for  existential  formulas/^  The  four  basic  problems  follow. 

Checking.  This  problem  is  stated  to  a  theorem  prover,  using  the 
formula  R(a,b),  where  a  and  b  are  two  specific  sequences  of  numbers. 
By  proving  R(a,b)  true  or  false,  a  theorem  prover  “checks”  whether 
b  ~  f(a).  This  problem  does  not  require  a  theorem  prover  with  the 
constructive-proof  ability. 

Simulation.  This  problem  is  stated  to  a  theorem  prover,  using  an 
expression  of  the  form  ^xR{a,x),  where  a  is  some  specific  sequence  of 
numbers.  By  providing  a  constructive  proof  of  the  truth  of  this  formula, 
a  theorem  prover  “simulates”  a  program  that  sorts  the  sequence  a; 
that  is,  it  computes  the  value  of  i{a)  . 

Verifying.  This  problem  is  stated  using  the  formula  Vxi?(A:,G(x)  ), 
where  G  is  a  program  provided  to  the  theorem  prover  by  the  person 
(or  machine)  who  wants  the  problem  solved.  By  proving  the  formula 
true,  the  theorem  prover  verifies  that  G  correctly  computes  the  function 
described  by  7?.  By  constructively  proving  the  formula  false,  the  theorem 
prover  shows  that  G  is  not  a  correct  program  for  the  function  described 
by  Ry  and  the  theorem  prover  provides  a  value  of  x  for  which  G 
needs  “debugging.” 

Program  Writing.  Tht  formula  for  this  problem  is  Vx3y7?(x,y). 
By  constructively  proving  that  this  formula  is  true,  a  theorem  prover 
can  provide  a  program  for  the  function  f  described  by  R.  By  con¬ 
structively  proving  the  formula  is  false,  a  theorem  prover  would  find 
a  value  of  x  for  which  f(x)  would  not  be  defined. 

Green  considered  in  detail  the  use  of  an  example-constructing 
theorem  prover  both  to  construct  a  program  that  computes  a  function 
described  by  a  relation  R  and  to  prove  the  correctness  of  the  program 
constructed.  The  theorem-proving  program  he  used  was  qa3,  and  the 
program  he  attempted  to  have  it  construct  was  one  that  sorted  an 
arbitrary  finite  sequence  of  numbers.  (Green  used  a  different  relation 

.  .  .  if  it  is  decidable,  and  if  the  theorem  prover  has  “infinite  time  and 
resources.’* 

See  the  preceding  section  for  a  discussion  of  theorem  provers  that  provide 
constructive  proofs  for  existential  formulas. 
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R  to  describe  the  function  “sort.”)  In  addition  to  axiom  clauses 
describing  the  relation  R,  qa3  must  be  given  clauses  that  describe  the 
primitive  functions  of  the  “target  language”  (in  this  case,  lisp)  in  which 
the  program  for  the  function  described  by  R  is  to  be  written.  Waldinger 
and  Lee’s  (1969)  program,  known  as  prow,  has  built  into  it  the 
axioms  describing  the  primitive  functions  of  lisp.  PROW  contains  a 
special  subprogram  (which  is  not  a  theorem  prover)  for  converting 
programs  from  one  language  into  another. 

In  order  to  develop  programs  that  have  loops  or  are  recursive, 
both  theorem  provers  must  be  given  axioms  for  mathematical  induc¬ 
tion  because,  in  general,  one  cannot  specify  an  upper  bound  for  the 
number  of  steps  that  might  be  required  by  an  execution  of  such  a 
program.  It  is  therefore  not  possible  to  prove  that  a  given  program  is 
correct  by  tracing  through  all  possible  executions  of  that  program. 
Rather,  a  theorem  prover  must  show  that 

1.  The  program  computes  the  correct  value  of  /(x)  for  some 
value  of  say,  x  =  a, 

2.  There  is  a  function  s  such  that  if  the  program  computes  the 
correct  value  of  f(x)  for  a  given  value  of  x,  then  it  also 
computes  the  correct  value  of  /(y)  for  y  =  ^(jr). 

3.  For  any  possible  value  of  x  there  is  a  number  n  such  that 
X  =  5(j(5(. . .  (5(a) )  .  .  .  ) ) ),  where  5  is  applied  n  times. 

The  function  s  is  known  as  the  successor  junction  utilized  by  the  in¬ 
ductive  proof Proving  condition  3  establishes  that  any  possible  value 
of  X  is,  for  some  n  (which  may  be  dependent  on  jc),  an  “nth,  successor” 
of  a.  Proving  conditions  1  and  2  establishes  that  the  program  computes 
the  correct  value  of  a  and  of  any  nth  successor  of  a.  Thus,  the  proof 
of  the  three  conditions  establishes  that  the  program  will  compute  the 
correct  value  of  f(x)  for  any  possible  value  of  x. 

In  all  work  to  date  on  automatic  program  writing,  both  5^  and  the 
proof  of  condition  3  are,  in  effect,  given  to  the  theorem  prover.  The 
correct  choice  of  a  successor  function  and  the  proof  of  its  validity  are 
at  the  moment  too  difficult  for  automatic  program  writers.  Currently, 
automatic  program  writers  are  capable  of  proving  conditions  1  and  2 
(given  s)  only  for  programs  that  are  very  simple,  such  as  a  program 
that  sorts  an  arbitrary  sequence  of  numbers. 

However,  the  fact  that  inductive  proofs  can  sometimes  be  ac- 

The  function  s  is  often  generalized  to  produce  a  set  of  possible  successors 
to  X.  For  this  generalization  the  identity  sign  (=)  in  axioms  2  and  3  should  be 
replaced  by  “is  an  element  of”  (e.) 
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complished  by  theorem  provers  is  significant,  especially  when  one 
investigates  the  extent  to  which  people  have  been  able  to  use  inductive 
proofs  to  show  properties  (such  as  “correctness”)  of  computer  pro¬ 
grams.  Floyd,  McCarthy  and  Painter,  Manna,  and  others  showed  that 
mathematical  induction  can  be  used  to  prove  the  correctness  of  a  variety 
of  computer  programs,  including  compilers  and  nondeterministic  pro¬ 
grams  (note  6-6).  The  discussion  of  theorem-proving  programs  will 
be  left  at  this  point,  with  the  observation  that  the  problems  of  program 
writing  can  at  least  be  stated  formally  and  can  often  be  solved  in  a 
formal  manner  by  human  beings. 


NOTES 

6-1.  Some  of  those  responsible  for  the  development  of  predicate  calculus 
and  early  work  on  meta-mathematics  include  Boole,  Cantor,  Russell,  White- 
head,  Lewis,  Dedekind,  Peano,  Frege,  Zermelo,  Hilbert,  Brouwer,  Kronecker, 
Poincare,  Tarski,  Skolem,  and  Godel.  The  Bibliography  contains  selected 
references  to  current  texts  on  mathematical  logic  by  Kleene,  Church,  Prior, 
Quine,  Shoenfield,  Wang  and  others.  (Also  see  Benacerraf  and  Putnam, 
1964;  van  Heijenoort,  1967.) 

6-2.  Many-valued  logics,  modal  logics,  and  fuzzy  logics  have  often  been 
suggested  as  the  most  realistic  and  desirable  frameworks  within  which  to  con¬ 
struct  theorem  provers.  These  logics  differ  from  predicate  calculus  mainly 
in  the  inference  rules  they  provide;  their  inference  rules  do  not  require  that 
a  sentence  be  completely  and  exactly  true  in  order  for  it  to  be  used  in 
deriving  other  sentences.  Rather,  sentences  are  allowed  to  have  many  differ¬ 
ent  values  besides  “true’*  and  “false.”  Thus,  in  fuzzy  logic,  the  truth  value 
of  a  sentence  may  be  any  real  number  between  zero  and  one,  inclusive 
(“false”  and  “true,”  respectively).  Space  does  not  permit  a  detailed  treat¬ 
ment  of  these  logics;  the  interested  reader  is  referred  to  the  works  of  Acker¬ 
man  (1967),  McCarthy  and  Hayes  (1968),  Prior  (1957),  Quine  (1961), 
Feys  (1965),  Zadeh  (1965,  1968),  and  Tsichritzis  (1968)  cited  in  the 
Bibliography.  Recently,  R.C.T.  Lee  (1971)  showed  that  the  resolution  prin¬ 
ciple  developed  in  this  chapter  can  be  used  within  a  formalization  of  fuzzy 
logic.  The  discussion  of  “meaning”  presented  in  Chapter  7  is  relevant  to 
many-valued  logics. 

Green  called  this  technique  answer  extraction.  The  present  author 
prefers  to  use  the  phrases  “constructive-proof  generation”  and  “example 
construction,”  since  these  do  not  imply  linguistic  ability,  an  aspect  of  artifi¬ 
cial  intelligence  that  we  have  not  yet  discussed.  However,  it  should  be  noted 
that  Green’s  early  papers  were  largely  concerned  with  question-answering 
and  the  ability  of  machines  to  use  natural  languages.  Also,  the  phrase  “an- 
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swer  extraction”  is  in  fairly  common  usage.  Chapter  7  discusses  the  use  of 
theorem  provers  within  language-understanding  systems. 

6-4,  A  third  approach  to  planning  was  suggested  by  Kling,  within  the 
context  of  proving  theorems  by  analogy.  Suppose  the  theorem  prover  is 
given  a  proof  for  a  theorem  V  and  required  to  find  a  proof  for  an  analogous 
theorem  T,  and  suppose  that  the  proof  for  T'  requires  the  establishment  of 
certain  “smaller”  theorems  or  lemmas.  The  proofs  of  these  lemmas  are  also 
given  to  the  theorem  prover.  Kling  suggested  using  an  analogy  generator  to 
produce  analogs  for  the  lemmas  associated  with  T",  and  having  the  theorem 
prover  attempt  to  find  proofs  for  these  analogs.  If  proofs  were  found,  then 
the  clauses  associated  with  the  analogs  could  be  used  in  the  data  base  for  T, 
To  the  present  author’s  knowledge,  Kling  has  not  yet  implemented  this 
method.  Indeed,  his  analysis  (1971a,  pp.  145-148)  suggested  that  zorba-1 
may  not  be  suitable  for  such  an  implementation.  However,  the  idea  indi¬ 
cates  a  way  in  which  problem-reduction  techniques  might  be  used  “by 
analogy”  in  theorem  proving. 

6-5.  In  fact,  for  reasons  that  include  both  theoretical  and  practical  limita¬ 
tions,  no  theorem  prover  can  be  really  complete.  Even  though  a  theorem 
may  be  logically  implied  by  a  set  of  axioms,  we  cannot  guarantee  that  the 
theorem  prover  will  eventually  develop  a  proof  for  it,  because  of  ( 1 )  the 
undecidability  of  the  predicate  calculus  and  (2)  the  limitations  of  space  and 
time  which  affect  the  computational  ability  of  any  machine.  (However,  we 
should  note  that  our  first  condition  does  not  hold  for  the  first-order  predi¬ 
cate  calculus;  given  an  arbitrary  sentence  and  a  set  of  axioms,  the  semi- 
decidability  of  the  first-order  predicate  calculus  guarantees  that,  if  the 
sentence  is  logically  implied  by  the  axioms,  a  resolution-based  theorem 
prover^given  enough  space  and  time — will  eventually  find  a  proof  for  it; 
on  the  Other  hand,  if  there  is  no  proof  for  the  sentence— that  is,  it  is  not 
logically  implied  by  the  axioms— such  a  theorem  prover  may  not  be  able  to 
disprove  the  sentence,  no  matter  how  much  space  and  time  we  give  it.) 

6-6.  A  good  survey  of  mathematical  induction  and  the  subject  of  auto¬ 
matic  prograni  writing  was  given  by  Marina  and  Waldinger  (1970)  . 'They 
suggested  partial-function  logic  (predicate  calculus  with  “undefined”  as  a 
truth  value;  see  McCarthy,  1963b)  as  the  most  natural  language  for  auto¬ 
matic  program  synthesis.  Other  papers  on  the  subject  have  been  written  by 
Balzer  (1972),  and  Feldman  (1972).  Dijkstra  (1965  et  seq.)  has  devel¬ 
oped  the  paradigm  of  structured  programming  as  a  framework  within 
which  to  prove  the  correctness  of  programs.  Recently,  Scott  (1971)  and 
Milner  (1972)  have  developed  a  mathematical  logic  of  computation  that  is 
of  great  relevance  to  this  subject.  And,  Sussman  (1972)  describes  the  gen¬ 
eral  structure  of  a  conniver  program  (called  hacker)  for  automatic  pro¬ 
gram  writing. 
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EXERCISES 

6-1,  Show  that  U  V  can  be  rewritten  as  HA  (C/,  ”11^). 

6-2.  Find  clause-form  equivalents  for  the  following  formulas: 

(a)  ^x{P{x)-^\/{P{x),Q{x))). 

(b)  0'^xP(x))^{^x~iP(x)). 

(c)  (V^3yA(F(A:),j2(A:,y)))->ajrA(P(^),Q(jr,:r)). 

6-3.  Find  most  general  unifiers  for  each  of  the  following  sets  of  literals: 

(a)  {Q(x,a,y),Q(a,x,y)}. 

(b)  {P(x,f{x)),P(g(x),a))}. 

(c)  {R{u,wJ(u))Mb.x,g(x))}. 

(d)  {W(z,cJ(y)),W(a,x,z),W{f{y),u,g{x))}. 

6-4.  Use  the  resolution  principle  to  derive  contradictions  from  the  negations  of 
each  of  the  following  predicate  calculus  tautologies: 

(a)  Vx(F(x)->P(;c)) 

(b)  (^ajcF(;c))^(Vxnp(A:)) 

(c)  (>/xV(Pix),Q(x)))-^(V((VxP{x)),{:ixQ(x)))). 

6-5.  Construct  a  predicate  calculus  formalization  for  the  Missionaries-and- 
Cannibals  Problem  (Exercise  3-2);  give  a  resolution-based  proof  that  it  is  solvable 
and  use  the  example-construction  technique  to  find  a  solution. 

6-6.  Present  a  predicate  calculus  formalization  for  the  Mutilated  Checkerboard 
Problem  (Exercise  3-8),  and  describe  how  it  might  be  used  to  prove  the  checker¬ 
board  cannot  be  covered  by  the  tiles  as  required. 

6-7.  (a)  Present  a  predicate  calculus  formalization  for  the  Confusion-of-Patents 

Problem  (Exercise  3-3)  and  give  a  resolution-based  proof  that  it  is  solvable, 
(b)  Use  the  technique  of  example  construction  to  find  the  solution  to  the  problem. 

6-8.  One  nice  aspect  of  the  planner  “robot  calculus”  is  that  it  allows  a  relation 
or  a  predicate  to  have  a  variable  number  of  arguments.  Give  some  real-world 
examples  illustrating  such  relations. 

6-9.  In  the  discussion  of  planner  theorems  the  following  statement  was  pre¬ 
sented: 


3R3Y[R(Y,Turing)-»Y(Turing)] 

Find  two  English  words  that  might  plausibly  be  substituted  for  R  and  Y  to  make 
R  ( Y, Turing  ) -^  Y  (  Turing  ) 
a  “reasonable”  statement. 

6—10.  {The  King-and-the-Wizards  Problem.)  (a)  Long  ago,  a  wicked  king  was 
searching  for  a  new  wizard  with  whom  to  plot  some  devious  schemes.  He  sum¬ 
moned  to  him  three  wizards  who  seemed  especially  promising,  and  let  them  into  a 
small  room,  which  was  barren  except  for  a  lighted  candle  on  a  table  in  the  middle 
of  the  room.  “Listen  to  me  well,”  he  said.  “In  a  few  minutes  all  of  you  will  be 
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blindfolded,  and  I  will  paste  upon  each  of  your  foreheads  a  uniformly  colored 
spot  of  black  or  white  paper.  At  least  one  spot  will  be  white.  The  first  of  you  who 
guesses  the  color  of  his  own  spot  will  become  my  new  wizard,  and  ride  in  his 
own  chariot,  with  all  expenses  paid.  The  other  two  of  you  will  be  sent  to  a  terrible 
fate  that  I  shall  not  describe.  None  of  you  will  be  allowed  to  remove  any  of  the 
spots,  and  you  will  each  be  allowed  only  one  guess.”  The  king  then  ordered  his 
guards  to  blindfold  the  wizards,  proceeded  to  paste  white  spots  on  all  the  wizards* 
foreheads,  and  finally  had  their  blindfolds  removed.  After  a  few  seconds,  one  of 
the  wizards  correctly  identified  the  color  of  the  spot  on  his  forehead.  How  did  he 
know  it?  (b)  Present  a  predicate  calculus  axiomatization  for  the  wizard’s  reason¬ 
ing.  (c)  What  sort  of  thoughts  might  the  other  two  wizards  have  been  thinking? 


“We  could  play  at  questions.” 

— Rosencrantz,  in  Rosencrantz  and 
Guildenstern  Are  Dead.  (Stoppard,  1967) 

“Augustine  describes  the  learning  of  human  language 
as  if  the  chiid  came  into  a  strange  country  and  did  not  under¬ 
stand  the  language  of  the  country;  that  is,  as  if  it  already 
had  a  language,  only  not  this  one.” 

— Wittgenstein,  Philosophical  Investigations. 

“I  find  it  difficult  to  believe  that  whenever  I  see  a  tree 
I  am  really  seeing  a  string  of  symbols.” 

—McCarthy,  in  a  discussion  on  grammatical 
inference  and  pattern  recognition. 

“As  a  concluding  remark:  could  this  art  be  applied  (we 
put  the  question  in  strictest  confidence) — could  it,  we  ask,  be 
applied  to  the  speeches  in  Parliament?” 

— Lewis  Carroll,  Photography  Extraordinary. 

“There  is  of  course  no  restriction  in  the  memory  format 
against  having  concepts  without  English  names,  and  in  fact 
[its]  present  memories  necessarily  include  such  concepts.” 

— Quillian,  describing  the  structure  of 
the  TLC  computer  program.  (Quillian,  1969) 

“Danger  of  tumbling  upwards  be  in  deep-sea.” 

— Protosynthex  III,  a  computer  program. 
(Schwarcz,  Burger,  and  Simmons,  1970) 

“The  challenge  of  programrning  a  computer  to  use  lan¬ 
guage  is  really  the  challenge  of  producing  intelligence.” 

— Winograd,  1971. 

“In  any  case,  these  are  but  steps  toward  more  graphical 
program-description  systems,  for  we  will  not  forever  stay  con¬ 
fined  to  mere  strings  of  symbols.” 

— Minsky,  1970. 

“What  does  meaning  mean?” 

— Anonymous. 

“Imagine  a  people  in  whose  language  there  is  no  such 
form  of  sentence  as  ‘the  book  Is  in  the  drawer’  or  ‘water  is 
in  the  glass’,  but  whenever  we  should  use  these  forms  they 
say,  ‘The  book  can  be  taken  out  of  the  drawer’,  ‘The  water 
can  be  taken  out  of  the  glass’. 

—Wittgenstein,  The  Brown  Book. 

“I  have  traveled  more  than  anyone  else,  and  I  have 
noticed  that  even  the  angels  speak  English  with  an  accent.” 

— Mark  Twain. 
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tNTRODUCTION 

This  chapter  is  concerned  with  the  ability  of  machines  to  use 
languages.  We  shall  first  discuss  the  nature  of  language,  both  as  it  is 
used  by  “living  creatures”  and  as  it  is  used  by  “machines”,  giving 
primary  attention  to  two  of  the  most  important  features  that  are  pos¬ 
sessed  by  human  and  computer  languages:  extensibility,  and  self- 
reference.  A  conclusion  will  be  drawn  that,  of  all  the  machines  and 
animals  known  to  man,  computers  belong  to  the  handful  (also  in¬ 
cluding  chimpanzees  and  dolphins)  we  might  plausibly  expect  to  learn 
our  languages. 

Of  predominant  interest  throughout  this  chapter  is  the  ability  of 
sentences  in  a  language  to  have  “meaning”  to  those  who  use  the  lan¬ 
guage.  A  sentence  that  has  meaning  is  said  to  contain  semantic 
information}  The  third  section  of  this  chapter  will  describe  how 
machines  can  “understand”  and  “create”  sentences  that  convey 
semantic  information,  and  will  discuss  computer  programs  that  do  this 
for  sentences  written  in  English.  A  collection  of  some  of  the  conversa¬ 
tions  people  have  had  with  computers  will  be  presented,  primarily 

^  The  “semantic  information”  of  a  sentence  should  not  be  confused  with  the 
“information”  measure  described  in  Chapter  2.  (See  note  7-7.) 
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oriented  toward  the  ability  of  computers  to  solve  various  kinds  of 
problems  stated  in  English.  The  final  section  takes  up  more  general 
questions,  relating  the  problems  of  language  use  and  development  to 
those  of  teaching  and  learning,  pattern  perception,  and  general 
problem-solving  and  reasoning  programs. 


NATURAL  AND  ARTIFICIAL  LANGUAGES 
Definitions 

To  facilitate  matters,  we  need  some  rough  definitions  for  “lan¬ 
guage,”  “sentence,”  and  “meaning.”  These  definitions  will  be  refined 
and  amplified  throughout  the  rest  of  this  chapter. 

A  language  is  a  SQi  oi  sentences  that  may  be  used  as  signals  to 
convey  semantic  information.  The  existence  of  a  signal  naturally  im¬ 
plies  the  existence  of  an  emitter  and  a  remver  (perhaps  more  than 
one)  and  of  some  “embodiment,”  or  means  of  transmission  for  the 
signal.  The  meaning  of  a  sentence  is  the  semantic  information  it  con¬ 
veys.  For  a  given  sentence  (signal),  this  information  may  vary  with 
the  situation  in  which  it  is  used;  in  general,  we  can  think  of  the  meaning 
of  a  sentence  as  being  a  description  of  three  things:  (1)  whatever 
causes  the  sentence  to  be  used;  (2)  whatever  is  caused  by  the  use  of 
the  sentence;  (3)  whatever  else  is  described  by  the  sentence.  It  is  the 
task  of  those  who  use  a  sentence  (the  emitters  and  receivers)  to  “under¬ 
stand”  these  elements  of  its  meaning — for  a  computer  that  uses  a 
sentence,  “understanding”  may  be  corresponded  to  making  internal 
data  structures  (vectors,  lists,  graphs,  programs,  etc.)  that  model  these 
elements  of  the  meaning  of  the  sentence.  “Communication”  is  a  word 
we  use  to  describe  processes  in  which  one  or  more  sentences  are  trans¬ 
mitted  and  understood. 

A  few  examples  will  clarify  the  concept  of  “meaning”  that  is  ad¬ 
vocated.  Consider  the  following  sentences: 

A.  I  have  four  aces. 

5.  Our  position  is  10  miles  north  of  yours. 

C.  Elect  me  and  I  will  end  the  war  honorably. 

D.  Eat  cereal  X  and  grow  healthy  and  strong, 

E.  Why? 

F.  I  love  you. 

G.  People  who  apply  for  marriage  licenses  wearing  shorts  or 
pedal  pushers  will  be  denied  licenses.^ 
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H.  The  sum  of  18  and  32  is  50. 

7.  The  equation  =  —  1  has  a  solution. 

Surely  the  “meaning”  of  these  sentences  is  not  something  fixed  or 
immutable.  The  meaning  of  a  sentence  generally  depends  as  much  on 
who  utters  it,  and  where,  when,  and  to  whom  it  is  uttered,  as  it  depends 
on  the  sentence  itself.  In  understanding  a  sentence,  one  should  attempt 
to  model  what  causes  the  sentence  to  be  transmitted,  what  the  emitter 
of  the  sentences  hopes  that  it  will  cause,  etc.,  as  well  as  what  the  sentence 
itself  describes. 

Throughout,  this  chapter  stresses  the  importance  of  model  making 
in  the  processes  of  communication  and  understanding.  However,  the 
student  should  be  warned  that,  especially  for  languages  such  as  English 
and  French,  there  is  no  current,  complete  explanation  for  how  com¬ 
puters  should  go  about  “understanding”  sentences.  The  problems  con¬ 
nected  with  modeling  the  semantic  information  carried  by  sentences 
are  as  deep  and  complex  as  the  situations  these  sentences  may  describe. 
This  chapter  can  do  little  more  than  present  some  of  the  requirements 
that  would  have  to  be  satisfied  by  an  adequate  formalism  for  “models 
of  meaning,”  describe  how  computer  programs  currently  approach  the 
subject,  and  suggest  how  research  might  be  continued  (see  note  7-1). 

It  will  serve  us  well  to  distinguish  between  two  types  of  languages 
called  natural  and  artificial  languages.  The  differences  between  them  lie 
mainly  in  the  uses  that  are  made  of  them,  and  in  the  knowledge  we 
have  about  them.®  Although  both  forms  of  language  are  of  much  in¬ 
terest  in  themselves,  our  discussion  will  center  on  their  relations  to 
each  other,  and  especially  on  the  ability  of  artificial  languages  to 
“simulate”  natural  languages.  By  natural  languages  refer  to  the 
languages  that  living  creatures  use.  for  communication,  whereas  by 
artificial  languages  wq  mean  certain  mathematically  defined  classes  of 
signals  that  can  be  used  for  communication  with  machines. 

Natural  Languages 

The  natural  languages  constitute  a  very  broad  category,  since 
communication  processes  are  important  to  virtually  every  living  system 
in  existence.  We  may  group  natural  languages  into  two  large  sub¬ 
categories,  and  name  them  cell-level  and  organism-level  natural  lan¬ 
guages. 


2  xtiis  example  sentence  is  quoted  from  Kuno  (1965). 

*  These  are  not  necessarily  differences  of  substance;  probably  the  distinction 
between  them  will  become  less  as  our  understanding  increases. 
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The  cell-level  languages  are  evidently  the  oldest  natural  languages. 
The  emitters  and  receivers  that  use  these  languages  are  living  cells: 
The  methods  for  transmission  of  sentences  (signals)  are  primarily 
chemical  and  electrical.  “Sentences”  transmitted  chemically  correspond 
to  molecules  (often  called  “messenger  molecules”),  which  may  act  by 
catalysis  to  affect  processes  in  the  receiver.  One  well-known  group  of 
such  sentences  is  that  of  the  RNA  molecules,  which  typically  carry 
information  between  parts  of  individual  living  cells.  Very  little  is 
known  about  cell-level  “molecular  languages”  except  that  there  are  a 
huge  number  of  molecular  sentences  that  can  have  “meaning”  to  living 
cells.  It  may  be  a  long  time  before  scientists  can  “understand”  them. 
For  more  information  on  these  languages,  see  Pribram  (1971). 

Organism-level  natural  languages  are  much  more  familiar  to  us. 
The  emitters  and  receivers  that  use  these  languages  are  living  organisms 
(animals,  plants,  etc.);  the  means  of  transmission  include  chemical, 
visual,  audial,  and  tactile  techniques.  Many  species  have  acquired  these 
languages,  primarily  to  carry  information  about  food,  danger,  and  sex. 
Typically,  the  language  used  by  the  organisms  of  a  given  species  will 
have  only  a  small  (say,  less  than  100)  number  of  sentences  or  signals, 
and  there  will  be  no  provision  within  the  language  for  extending  that 
number.  Usually  the  organisms  which  use  these  languages  do  so  in¬ 
voluntarily,  in  automatic  response  to  the  presence  of  certain  stimuli  in 
their  environments. 

The  only  known  organism-level  natural  languages  that  are  not  so 
limited  are  mankind’s  spoken  and  written  languages  (English,  French, 
Chinese,  etc,).  In  theory,  these  languages  possess  an  infinite  number 
of  possible  sentences  that  can  be  used  as  signals  by  people.  However, 
no  one  knows  how  many  of  the  “possible”  sentences  are  “meaningful” 
in  practice.  The  best  we  can  say  is  that  the  number  may  be  “com¬ 
parable”  to  that  of  the  meaningful  molecular  sentences  in  cell-level 
languages. 

One  major  difference  between  human  languages  and  those  used 
by  other  organisms  lies  in  the  structural  nature  of  the  sentences  we  use. 
The  sentences  of  any  human  language  are  essentially  stringlike  struc¬ 
tures  (sequences)  of  words.  Spoken  words  are  themselves  essentially 
stringlike  structures  of  phonemes  (vocally  producible  sounds  that 
constitute  the  “alphabet”  of  the  spoken  language),  whereas  written 
words  are  often  sequences  of  letter-symbols,  which  constitute  the 
alphabet  of  the  written  language  (note  1-2).  Various  languages,  of 
course,  have  different  spoken  and  written  alphabets.  English  has  a 
written  alphabet  of  26  letters  and  a  spoken  alphabet  of  48  phonemes 
(J.  B.  Carroll,  1964). 
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Currently,  about  5,000  languages  and  dialects  are  spoken  through¬ 
out  the  world.  The  two  most  commonly  spoken  languages  are  Northern 
Chinese  (or  Mandarin)  and  English,  which  are  used  by  about  600 
million  and  350  million  people,  respectively,  English  is  the  language 
in  most  widespread  use,  being  spoken  by  10%  or  more  of  the  popula¬ 
tion  in  29  countries;  it  is  also  the  language  with  the  largest  vocabulary, 
containing  about  490,000  words,  plus  another  300,000  technical  terms 
(McWhirter  and  McWhirter,  1971).  Estimates  range  on  the  maximum 
size  of  the  individual  human  vocabulary;  it  is  very  likely  that  no  in¬ 
dividual  uses  more  than  100,000  words  (probably  the  boundary  is 
lower,  around  60,000) — normal  literature  written  in  English  makes 
use  of  about  10,000  words,  while  well-educated  conversation  uses  about 
5,000  words.  Of  course  it  is  possible  to  converse  rather  well  using  much 
smaller  vocabularies.  Thus,  “Basic  English”  (C.  K.  Ogden,  1933) 
contains  only  850  words.  The  1971  Guinness  Book  of  World  Records 
reports  that  the  language  with  the  smallest  vocabulary  is  Taki  taki,  a 
South  American  language  that  uses  only  340  words. 

The  sentences  in  a  language  are  always  essentially  sequences  of 
words  from  the  vocabulary  of  that  language,  but,  typically,  not  every 
sequence  of  words  constitutes  a  sentence.  A  set  of  rules  that  allows 
one  to  recognize  the  sequences  of  words  that  are  sentences  in  a 
language  is  known  as  a  grammar  for  that  language.^  Grammars  are 
said  to  describe  the  structural,  or  syntactic,  nature  of  languages. 

Of  course  one  wants  to  do  more  than  simply  recognize  which 
sequences  of  words  are  sentences  in  a  language;  it  is  of  primary  im¬ 
portance  to  be  able  to  “understand”  the  sequences  one  recognizes.  One 
of  the  major  problems  confronting  linguistics  today  is  development  of 
an  adequate  theory  of  the  relationship  between  the  syntactic  nature  of 
a  sentence  (or  set  of  sentences)  and  the  semantic  information  it  con¬ 
veys.  Two  importaht  approaches  toward  a  solution  of  this  problem 
Site  the  XhcoTics  pt  transformational  granimar  (Chomsky,  1959  et  seq.) 
and  systemic  grammar  (Halliday,  1961  et  seq.);  This  topic  is  discussed 
in  the  i  next  section  but  for  now  it  is  important  to  note  two  “trivial” 
things:  first,  the  structural  nature  (syntax)  of  a  sentence  helps  one 
determine  its  meaning;  second,  the  meaning  an  emitter  wants  to  convey 
helps  determine  the  structure  of  the  sentences  that  convey  it. 

One  of  the  most  valuable  aspects  of  human  languages  is  their 
extensibility:  The  words  and  sentences  of  the  English  language  (for 
example)  are  not  fixed.  Rather,  English  (like  most  if  not  all  other 

the  set  of  rules  also  enables  one  to  recognize  those  sequences  of  words 
which  are  not  sentences,  then  it  is  said  to  decide  the  language.  It  is  possible  for 
a  language  to  be  undecidable,  that  is,  such  that  no  grammar  can  decide  it. 
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human  languages)  includes  provisions  for  extending  its  own  use.  As  is 
evident  from  the  preceding  paragraphs,  the  chief  way  in  \vhich  English 
has  been  extended  has  been  through  the  definition  of  new  words. 
Another  way  is  the  introduction  of  new  symbols  (e.g.,  mathematical 
symbols).  It  is  even  possible  to  extend  a  language  by  adding  to  its 
syntactic  nature.  Thus,  it  is  possible  to  use  English  sentences  to  define 
(at  least  partially)  the  words,  sentences,  and  grammar  of  another 
language,  such  as  German — this  is  precisely  what  an  introductory 
English  textbook  on  German  will  attempt  to  do.  When  English  is  ex¬ 
tended  in  this  way  to  include  German  sentences,  the  German  sentences 
may  be  said  to  have  been  embedded  in  English. 

Closely  related  to  the  extensibility  of  human  languages  is  their 
ability  to  be  self-referencing.  An  English  sentence  (for  example,  this 
one)  can  refer  to  itself  or  to  other  sentences  (e.g.,  all  of  the  sentences 
in  this  book).  In  “understanding”  the  preceding  sentence,  one  must 
understand  the  phrase  “this  one”  (which  refers  to  the  entire  sentence 
in  which  it  occurs),  and  the  phrase  “all  of  the  sentences  in  this  book.” 
One  can  find  many  other  types  of  self-reference  exhibited  by  English 
sentences. 

A  third  aspect  of  human  (and  many  other)  languages  which  should 
be  mentioned  is  their  redundancy.  Any  means  of  transmitting  a  signal 
may  involve  some  “noise”  that  will  tend  to  distort  or  degrade  the  signal. 
To  convey  the  semantic  information,  one  should,  in  effect,  transmit  the 
signal  several  times,  because  it  is  very  unlikely  that  random  noise  will 
degrade  the  signal  the  same  way  every  time.  The  receiver  can  re¬ 
construct  the  original  signal  by  adopting  a  “majority  vote”  policy  when 
comparing  the  signals  he  receives.  Another  way  of  using  redundancy 
in  an  alphabetic  language  is  to  use  more  symbols  than  are  needed  to 
represent  each  word;  with  26  letters  one  could,  for  example,  represent 
each  of  10  million  words  uniquely  by  a  series  of  5  letters.  In  fact, 
English  uses  considerably  more  letters  than  are  necessary  to  represent 
each  of  its  words.  Finally,  one  can  also  obtain  redundancy  in  a  lan¬ 
guage  if  its  grammar  provides  sentences  with  “structural  redundance” 
(see  Cherry,  1957).  The  redundancy  of  English  is  often  quoted  as 
about  50%;  that  is,  an  English  sentence  is  usually  decipherable  even 
if  each  of  its  letters  is  blanked  out  independently  of  the  others  with  any 
probability  up  to  one-half. 

The  importance  of  language  to  the  development  of  human  in¬ 
telligence  is  a  subject  that  deserves  a  great  deal  of  attention,  certainly 
more  than  can  be  offered  in  this  book.  Wherever  people  have  formed 
societies,  they  have  developed  languages.  The  tendency  to  develop  lan¬ 
guages  is  one  of  the  most  important  traits  of  our  species.  One  of  the 
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more  remarkable  things  about  it  is  the  ability  of  the  young  child  to 
learn  the  language  of  the  society  in  which  he  is  raised.  Something  about 
the  way  in  which  societies  develop  languages  insures  that  practically 
every  child  will  be  able  to  accomplish  the  language-learning  feat  in  the 
space  of  a  few  years.  This  has  led  certain  scholars  (Chomsky,  1966) 
to  conjecture  the  existence  of  a  universal  grammar  underlying  all 
human  languages,  and  which  is  naturally  reflected  somehow  in  the 
learning  process  of  each  person,  to  the  extent  that  individuals  are 
enabled  to  learn  the  language  of  their  society  without  making  too  many 


Signs 

Description 

Context 

Com  e-gimme 

Beckoning  motion,  with  wrist  or 
knuckles  as  pivot. 

Sign  made  to  persons  or  animals, 
also  for  objects  out  of  reach.  Of¬ 
ten  combined:  “come  tickle,” 
“gimme  sweet,”  etc. 

More 

Fingertips  are  brought  together, 
usually  overhead.  (Correct  ASL 
form:  tips  of  the  tapered  hand 
touch  repeatedly.) 

When  asking  for  continuation  or 
repetition  of  activities  such  as 
swinging  or  tickling,  for  second 
helpings  of  food,  etc.  Also  used  to 
ask  for  repetition  of  some  perfor¬ 
mance,  such  as  a  somersault. 

Up 

Arm  extends  upward,  and  index 
finger  may  also  point  up. 

Wants  a  lift  to  reach  objects  such 
as  grapes  on  vine,  or  leaves;  or 
wants  to  be  placed  on  someone's 
shoulders;  or  wants  to  leave  pot¬ 
ty-chair. 

Tickle 

The  index  finger  of  one  hand  is 
drawn  across  the  back  of  the 
other .  hand.  (Related  to  ASL 
“touch.'*) 

For  tickling  or  for  chasing  games. 

Toothbrush 

Index  finger  is  used  as  brush,  to 
rub  front  teeth. 

When  Washoe  has  fmi.shed  her 
meal,  or  at  other  times  when 
shown  a  toothbrush. 

Cat 

Thumb  and  index  finger  grasp 
cheek  hair  near  side  of  mouth  and 
are  drawn  outward  (representing 
cat's  whiskers). 

For  cats. 

Key 

Palm  of  one  hand  is  repeatedly 
touched  with  the  index  finger  of 
the  other.  (Correct  ASL  form: 
crooked  index  finger  is  rotated 
against  palm.) 

Used  for  keys  and  locks  and  to 
ask  us  to  unlock  a  door. 

Baby 

One  forearm  is  placed  in  the 
crook  of  the  other,  as  if  cradling  a 
baby. 

For  dolls,  including  animal  dolls 
such  as  a  toy  horse  and  duck. 

Clean 

The  open  palm  of  one  hand  is 
passed  over  the  open  palni  of  the 
other. 

Used  when  Washoe  is  washing,  or 
being  washed,  or  when  a  com¬ 
panion  is  washing  hands  or  some 
other  object.  Also  used  for 
“soap.” 

Figure  7-1.  Some  signs  used  reliably  by  Washoe  after  22  months  of 
training.  (Gardner  and  Gardner,  1969,  reprinted  with  permission.) 
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n 

MARY 

SARAH 

MARY 

I 

★ 

INSERT 

1 

GIVE 

BANANA 

GIVE 

A 

PAIL 

A 

APPLE 

? 

★ 

INSERT 

A 

APPLE 

1 

RANDY 

APPLE 

UM 

DISH 

SARAH 

Figure  7-2.  Examples  of  the  sequences  of  symbols  used  by  Sarah. 
Reprinted  from  ‘The  Education  of  Sarah”  by  David  Premack  in  Psy¬ 
chology  Today  Magazine,  September  1970.  Copyright  ©  Communica¬ 
tions/Research/Machines,  Inc. 

wrong  guesses.  Such  a  grammar  would  also  account  for  the  similarities 
in  syntax  between  the  various  languages  that  people  have  developed 
(note  7-3). 

Efforts  have  been  made  to  teach  human  languages  to  other  animals, 
but  only  recently  have  researchers  achieved  any  success  (note  7-4). 
For  example,  two  chimpanzees  have  been  taught  to  communicate  with 
people  by  using  “sign  Janguages”  (Gardner  and  Gardner,  1969; 
Premack,  1970).  One  chimpanzee,  named  Washoe,  has  learned  over 
150  signs  of  the  American  Sign  Language  System,  originally  devised 
for  the  deaf  and  dumb;  this  is  the  language  in  which  words  are  repre¬ 
sented  by  movements  and  configurations  of  an  individual’s  hands  and 
fingers.  Sarah,  the  other  chimpanzee,  has  learned  to  communicate  with 
sentences  that  consist  of  simple  sequences  of  cards  bearing  printed 
symbols.  Figures  7-1  and  7-2  show  some  of  the  signs  and  card 
sequences  used  by  Washoe  and  Sarah.  Neither  chimpanzee  is  able  to 
use  very  long  or  complicated  sequences  of  signs  or  cards,  although 
Washoe  has  been  able  to  invent  a  few  new  signs  that  are  now  used  by 
some  people  learning  the  language. 

Because  our  spoken  and  written  languages  are  so  familiar  and 
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important  to  us,  it  is  customary  to  call  them  the  “natural  languages,”  as 
distinct  from  other  organism-level  and  cell-level  natural  languages.  So, 
unless  otherwise  specified,  the  phrase  “natural  language”  will  refer 
henceforth  only  to  spoken  and  written  languages  used  by  homo  sapiens. 


Artificial  Languages  and  Programming 
Languages 

Artificial  languages  are  certain  mathematically  defined  classes  of 
signals  that  can  be  used  for  “communication”  with  machines.  Through¬ 
out  the  rest  of  this  section  the  reader  will  be  introduced  to  those 
artificial  languages  that  can  be  used  for  communicating  with  computers. 
These  languages  are  generally  known  as  programming  languages. 
Chapter  6  has  already  given  a  brief  description  of  programming  lan¬ 
guages  (in particular,  lisp  and  planner).  Programming  languages  have 
many  properties  that  are  analogous  to  those  of  natural  languages.  The 
next  section  reviews  the  attempts  made  by  ai  researchers  to  “unify” 
the  artificial  and  natural  languages;  that  is,  to  design  machines  with  an 
ability  to  “communicate”  in  both  kinds  of  languages.  Here,  however, 
the  emphasis  is  on  languages  that  are  currently  conventional  for  pro¬ 
gramming  (communicating  with)  computers.  The  major  difference  be¬ 
tween  these  conventional  artificial  languages  and  natural  languages  is 
that  the  syntactic  and  semantic  properties  of  the  artificial  languages  are 
more  thoroughly  known  (in  the  sense  of  being  more  rigorously  formal¬ 
ized,  at  least  consciously)  than  are  those  of  natural  languages. 

Essentially,  a  programming  language  is  a  set  of  sentences  (signals), 
each  of  which  a  computer  may  receive  and  store  internally  as  a 
structure.  Data  structures  may  have  many  “forms”  (numbers,  vectors, 
matrices,  lists,  graphs,  etc.)  and, may  cause  the  computer  to,  perform 
many  different  “actions” — physically,  a  data  structure  is  usually  a  col¬ 
lection  of  electric  or  magnetic  charges  that  can  be  sensed  and  altered 
by  the  computer.  The  syntactic  nature  of  the  programming  language  is 
given  when  we  finitely  describe  the  exact  forms  of  its  sentences  and  their 
data  structures.  The  semantic  nature  of  the  language  is  given  when  we 
specify  the  actions  that  each  data  structure  will  cause  to  be  performed. 
A  data  structure  can  cause  the  computer  to  perform  actions  in  the 
external  world  (e.g.,  move  a  mechanical  arm,  or  transmit  electric 
signals  to  a  printer)  or  it  can  cause  the  computer  to  create  new  internal 
data  structures,  or  modify  or  erase  those  that  are  already  present.  If  a 
data  structure  is  causing  the  computer  to  perform  actions,  then  it  is 
called  a  program;  otherwise,  we  may  simply  call  it  data.  It  should  be 


282 


INTRODUCTION  TO  ARTIFICIAL  INTELLIGENCE 


noted  that  the  distinction  is  not  always  a  particularly  good  one:  The 
same  data  structure  may  be  a  program  and  also  be  data  that  the  com¬ 
puter  manipulates.  And,  though  it  is  a  good  approximation  to  say  that 
data  structures  “cause”  computers  to  perform  actions,  this  is  not  the 
entire  truth.  Whatever  action  is  caused  when  the  computer  meets  a 
data  structure  is  as  much  dependent  on  the  computer  as  it  is  on  the 
data  structure. 

To  illustrate  this,  let  us  recall  the  model  that  was  introduced  in 
Chapter  2  for  computers.  That  chapter  defined  Turing  machines  and 
showed  how  a  universal  Turing  machine  could  simulate  any  given 
Turing  machine.  Also  described  were  polycephalic  universal  Turing 
machines,  which  were  credited  as  a  better  model  for  modern  com¬ 
puters.  In  the  context  of  the  current  discussion,  consider  any  collection 
of  symbols  printed  on  the  (possibly  ^-dimensional)  tape(s)  of  a 
polycephalic  universal  Turing  machine  to  be  a  “data  structure.”  It  is 
clear  that  this  agrees  with  what  has  been  said  above  about  data 
structures;  that  is,  some  of  the  symbols  on  the  tape  of  the  universal 
Turing  machine  may  be  a  “program”  and  cause  the  machine  (com¬ 
puter)  to  perform  actions.  However,  it  is  also  clear  that  the  actions 
performed  by  the  machine  at  a  given  moment  depend  as  much  on  the 
“state”  of  the  machine  as  on  the  symbols  that  it  reads  with  its  tape- 
heads;  and  it  is  clear  that  the  same  data  structure  might  “cause”  dif¬ 
ferent  machines  to  perform  different  actions. 

Let  us  continue  to  use  the  (universal,  polycephalic)  Turing 
machine  formalism  to  discuss  programming  languages  and  computers, 
taking  care  to  observe  some  ways  in  which  modern  computers  deviate 
from  the  model,  as  well  as  ways  in  which  they  satisfy  it.  The  reader  may 
recall  that  in  Chapter  2  the  notion  of  a  “descriptive  string”  was  intro¬ 
duced  to  show  how  a  universal  Turing  machine  could  simulate  a  given 
Turing  machine;  namely,  the  next-move  function  of  any  Turing  machine 
(say,  T)  can  be  described  using  a  finite  string  of  blanks  and  I’s,  called 
a  “descriptive  string”  for  T.  If  this  descriptive  string  is  placed  ap¬ 
propriately  on  an  otherwise  blank  tape  of  an  appropriate  universal 
Turing  machine  (say,  I/),  then  U  will  use  the  descriptive  string  for  T 
in  such  a  way  that  U  will  simulate  T;  that  is,  U  will  manipulate  data 
structures  on  some  of  its  other  tapes  just  as  would  T,  However,  T  may 
itself  be  a  universal  Turing  machine  and  thus  it  is  possible  to  have  a 
machine  U  simulate  a  machine  T  simulating  a  machine  T',  etc.  A  Turing 
machine  (simple,  universal,  polycephalic,  or  whatever)  is  basically  a 
procedure  for  manipulating  symbols  on  tapes  (i.e.,  data  structures). 
“Descriptive  strings”  are  basically  data  structures  that  describe  pro¬ 
cedures  (Turing  machines).  A  universal  Turing  machine  is  a  Turing 
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machine  that  can  use  descriptive  strings  to  carry  out  the  procedures 
they  describe.  In  this  sense,  a  program  is  basically  a  descriptive  string 
that  can  be  used  by  some  universal  Turing  machine. 

Thus  we  may  consider  the  set  of  descriptive  strings  (programs) 
that  can  be  used  by  a  given  universal  Turing  machine  C/  to  be  a  language 
that  is  “understood”  by  U,  Each  descriptive  string  is  a  sentence  that  JJ 
understands;  the  “meaning”  of  a  given  sentence  (program)  is  the 
procedure  it  describes;  U  may  demonstrate  its  understanding  of  this 
meaning  by  carrying  out  the  procedure.  Imagine  a  person  “communicat¬ 
ing”  with  IJ  in  the  following  way:  the  person  (whose  name  will  be  yl) 
has  a  means  of  printing  symbols®  on  the  squares  of  one  of  U*s  tapes, 
called  the  input  tape;  the  signals  that  A  transmits  to  JJ  are  precisely  the 
symbols  A  decides  to  print;  in  addition,  A  has  the  ability  to  read  all 
symbols  that  are  printed  on  one  of  t/’s  tapes  which  will  be  called  the 
output  tape.  A  may  decide  that  U  “understands”  a  language  of  descrip¬ 
tive  strings  if  whenever  A  prints  a  sentence  of  that  language  on  C/’s  input 
tape,  JJ  eventually  prints,  on  its  output  tape,  the  result  of  carrying  out 
the  procedure  described  by  .^4’s  sentence.  (Of  course,  if  A  has  some 
knowledge  about  the  “internal  workings”  of  JJ  (its  next-move  function, 
or  the  symbols  printed  on  its  other  tapes ) ,  then  A  may  well  decide  that 
JJ  understands  a  given  programming  language,  without  very  extensively 
performing  this  experiment. ) 

It  should  be  emphasized  that  there  is  more  than  one  way  to  de¬ 
scribe  a  given  Turing  machine.  Chapter  2  presented  a  very  simple  way 
to  describe  the  next-move  function  of  any  given  Turing  machine;  that 
way  produced,  for  each  Turing  machine,  a  simple  string  of  blanks  and 
I’s.  In  essence,  this  was  a  description  of  a  programming  language, 
the  sentences  of  which  were  strings  that  could  be  used  by  an  “ap¬ 
propriate”  universal  Turing  machine  to  carry  out  the  procedures  they 
described.  Besides  the  “blank-one  language”  presented  in  Chapter  2, 
one  can  certainly  design  other  programming  languages  for  describing 
procedures.  Furthermore,  one  can  certainly  design  universal  Turing 
machines  that  would  not  “understand”  the  blank-one  language  but 
would  understand  some  other  language  for  describing  procedures.  In¬ 
deed,  the  major  difference  between  modern  computers  and  poly- 
cephalic  universal  Turing  machines  is  that  computers  understand 
languages  that  are  much  simpler  and  easier  for  people  to  use  (when 
describing  complicated,  useful,  “real-world”  procedures)  than  is  the 
blank-one  language.  What  all  these  languages  have  in  common  is  that  any 


®  This  includes  blanks;  that  is,  A  has  the  ability  to  erase  symbols  previously 
printed  on  the  input  tape. 
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SYMBOLS 


NUL 

Null 

DLE 

Data  Link  Escape  (CC) 

SOH 

Start  of  Heading  (CC) 

DC1 

Device  Control  1 

STX 

Start  of  Text  (CC) 

DC2 

Device  Control  2 

ETX 

End  of  Text  (CC) 

DC3 

Device  Control  3 

EOT 

End  of  Transmission  (CC) 

DC4 

Device  Control  4  (Stop) 

ENQ 

Enquiry  (CC) 

NAK 

Negative  Acknowledge  (CC) 

ACK 

Acknowledge  (CC) 

SYN 

Synchronous  idle  (CC) 

BEL 

Bell  (audible  or  attention  signal) 

ETB 

End  of  Transmission  Block  (C 

BS 

Backspace  (FE) 

CAN 

Cancel 

HT 

Horizontal  Tabulation  (punched  card  skip)  (FE) 

EM 

End  of  Medium 

LF 

Line  Feed  (FE) 

SUB 

Substitute 

VT 

Vertical  Tabulation  (FD) 

ESC 

Escape 

FF 

Form  Feed  (FE) 

FS 

File  Separator  (IS) 

CR 

Carriage  Return  (FE) 

GS 

Group  Separator  (IS) 

SO 

Shift  Out 

RS 

Record  Separator  (IS) 

SI 

Shift  In 

US 

Unit  Separator  (IS) 

SP 

Space 

DEL 

Delete 

ABBREVIATIONS 

(CC)  Communication  Control 
(FE)  Format  Effector 
(IS)  Information  Separator 

i  Figure  7-3.  The  ASCII  “alphabet”  for  programming  languages. 
(Chapin,  1971,  reprinted  with  permission.) 
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procedure  which  can  be  described  in  the  blank-one  language  can  also 
be  described  in  the  languages  that  are  understood  by  real  computers, 
and  vice  versa.  Thus,  in  theory  a  computer  can  perform  exactly  those 
procedures  that  can  be  carried  out  by  a  universal  Turing  machine.  If, 
for  any  Turing  machine,  a  programming  language  contains  at  least  one 
sentence  that  describes  the  procedure  carried  out  by  that  Turing 
machine,  then  the  programming  language  is  said  to  be  a  universal 
programming  language.  Again,  however,  a  language  can  be  a  universal 
programming  language  only  with  respect  to  some  computer  (universal 
Turing  machine)  that  “understands”  it,  or  in  other  words,  one  having 
the  “language  capability”  for  that  language. 

The  discussion  so  far  describes  the  “semantics”  of  programming 
languages.  In  the  next  few  pages  the  “syntactics”  of  these  languages 
will  be  described. 

With  respect  to  syntactics,  note  first  that  all  programming  languages 
used  by  real  computers^  make  use  of  sentences  that  are  essentially 
strings  (sequences)  of  symbols.  In  other  words,  they  use  “descriptive 
strings,”  though  these  descriptive  strings  consist  of  many  other  symbols 
besides  “blank”  and  “1.”  Figure  7-3  shows  a  set  of  symbols  that  may 
currently  appear  in  the  sentences  (programs)  of  universal  program¬ 
ming  languages.  One  may  transmit  strings  of  these  symbols  to  the 
computer  by  typing  them  out  on  a  typewriter  connected  to  the  com¬ 
puter,  or  by  “feeding”  the  computer  a  deck  of  appropriately  punched 
cards,  etc.  (The  reader  should  note  that  each  of  these  symbols  is 
actually  converted  into  a  seven-place  string  of  zeroes  and  ones  when  it 
is  read  into  the  computer;  the  “code”  for  making  this  conversion  is 
indicated  in  Fig.  7-3.)  A  total  of  128  symbols  make  up  this  “alphabet” 
of  current  programming  languages. 

As  stated  before,  a  programming  language  must  contain  at  least 
one  program  describing  each  Turing  machine,  if  it  is  to  be  “universal.” 
However,  there  are  an  infinite  number  of  different  Turing  machines  and 
therefore  a  universal  programming  language  must  contain  an  infinite 
number  of  programs,  or  sentences.  A  proper  description  of  the  syntactics 
of  the  language  must  describe  the  structural  nature  of  each  of  its  sen¬ 
tences.  This  can  be  done  either  by  presenting  all  of  the  sentences  in 
the  language  or  by  giving  some  set  of  rules  that  could  be  used  to  con¬ 
struct  any  sentence  belonging  to  the  language^  given  enough  time,  yet 
could  not  be  used  to  construct  a  sentence  not  belonging  to  the  language. 
Such  a  set  of  rules  is  called  a  grammar  for  the  language.  To  properly 

®  There  is  no  reason  why  the  sentences  of  a  programming  language  would 
have  to  be  stringlike  structures;  indeed,  some  researchers  have  suggested  that 
eventually  other  types  of  structures  will  also  be  used  (e.g.,  Minsky,  1970).  In 
the  fourth  section  of  this  chapter,  languages  with  sentences  of  more  general  struc¬ 
tural  nature  will  be  discussed. 
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describe  the  syntactics  of  a  universal  programming  language  it  is  clear 
that  we  must  present  a  grammar  for  that  language.  Several  different 
kinds  of  grammars  for  describing  languages  have  been  developed  (see, 
e.g.,  Chomsky,  1959;  Post,  1944;  Backus,  1959).  A  formalization  for 
the  Chomsky  phrase-structure  grammars  will  be  presented.  For  the 
reader  who  wishes  to  skip  the  mathematics  of  this  formalization  (which 
is  based  on  that  given  by  Hopcroft  and  Ullman,  1969),  the  ordinary 
discourse  will  be  resumed  in  the  later  section  entitled  “Grammars, 
Machines,  and  Extensibility.” 

String  Languages 

If  F  is  a  set  of  symbols,  then  F*  represents  the  set  of  all  finite 
strings  composed  of  elements  from  F.  A  string  is  an  ordered  series  of 
symbols  (i.e.,  for  some  n,  an  ordered  n-tuple).  Thus,  if  V  —  {0,  1), 
then  F*  =  {e,  0,  1,  01,  10,  00,  11,  111,  101, . .  .},  where  c  represents 
the  empty  string,  which  does  not  contain  any  symbols.  We  stipulate 
that  €  is  always  an  element  of  F*,  for  any  F,  and  use  to  denote 
F*  —  {c}.  A  language  L  on  the  alphabet  F  is  then  any  set  L  that  is  a 
subset  of  F* ;  that  is,  L  C  F*. 

A  grammar  G  is  defined  to  be  an  ordered  quadruple 

G  -  (F^.,Fr,P,S) 
satisfying  the  following  conditions: 

1.  F2^,Fir,P  are  finite  sets. 

2.  FjfPiFr  “  <i>  (no  elements  belong  to  both  and  Fy). 

3.  (^Sis  called  the  start  symbol). 

4.  Vn  and  Fy  are  sets  of  symbols.  (The  symbols  belonging  to 
Vn  are  referred  to  as  production  variables,  and  those  belong¬ 
ing  to  Vt  are  referred  to  as  terminals.  The  alphabet  of  G  is 
F^UF^.) 

5.  P  is  a  set  of  written  expressions  of  the  form  a  — >  yS  (or  equiva¬ 
lently,  ordered  pairs  of  the  form  (a,^)),  where  acF^  and 
ySeF*. 

It  is  customary  to  use  capital  Roman  (italic)  alphabet  letters  for 
production  variables  and  to  use  lower-case  letters  at  the  beginning  of 
the  Roman  (italic)  alphabet  for  terminals.  Strings  of  terminals  are 
represented  by  lower-case  letters  near  the  end  of  the  Roman  alphabet; 
strings  of  production  variables  and  terminals  are  denoted  by  lower-case 
Greek  letters. 

If  a  and  are  two  strings,  then  a/3  denotes  the  string  obtained  by 
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writing  down  all  elements  in  a,  followed  by  all  elements  in  p.  Thus,  if 
<x  =  abc  and  p  =  def,  then  (xp  =  abcdef  and  p(x  =  defabc. 

We  can  now  proceed  to  define  the  language  L(G)  generated  by  a  given 
grammar  G.  To  accomplish  this,  we  need  two  mathematical  relations,  ^ 
and  §* ,  which  can  exist  between  strings  in  V*.  The  first  relation  is  defined 
as  follows:  If  a  — >  jS  is  an  element  of  P,  and  y  and  5  are  any  strings  in  V*, 
then  ya8  is  said  to  directly  derive  yPb,  and  we  write  yab  T  7/35.  The  re¬ 
writing  rule  or  production  rule  a  — >  ^3  is  said  to  be  applied  to  the  string 
7^5  to  obtain  yPb. 

The  second  relation,  is  defined  as  follows:  For  two  strings  a  and 
|8  in  K*,  we  say  that  (a  derives  p)  iff  we  can  obtain  P  by  the  applica¬ 
tion  of  some  finite  number  of  production  rules  in  P  to  a.  That  is,  ot  ^>p 
iff  there  exist  in  V*  strings  71,  72,  .  .  .  ,  7n  such  that  a  Tyu  7i  ^72> 
.  .  ,  ,  7n-l  ^  yn,  yn  “T  P» 

The  language  L(G)  generated  by  the  grammar  G  is  now  defined 
to  be 

L(G)  =  {w\weV^  and  *SS"w} 

In  other  words,  a  string  is  in  L(G)  if  it  is  made  up  entirely  of  termi¬ 
nals  and  it  can  be  derived  from  S.  If  w  can  be  derived  from  5,  then  a 
sequence  of  strings 

‘S',  71,  72,  .  .  .  ,yn,W 

such  that  5  F  71,  7i  T  72,  •  .  .  ,  yn  7  w  is  known  as  a  derivation  of  w 
in  the  grammar  G.  (If  it  is  clear  which  grammar  is  involved,  we  use  => 
for  y  and  ^>for  |*.) 

As  an  example,  consider  the  grammar  Gi  =  where 

Vn  =  {S,A)yT  ~  {0,1}  and  P  contains  the  following  production  rules, 

1.  S-^Al 

2.  S-^SO 

3.  A^SO 

4.  S-^0 

5.  5-^1 

6.  A-^0 

The  language  L(Gi)  generated  by  this  grammar  contains  ail  finite 
strings  made  up  of  O’s  and  I’s  in  which  there  are  no  consecutive  I’s. 
To  illustrate,  the  string  10010  may  be  derived  from  S  as  follows: 


Given: 

S 

Apply  Rule  2: 

SO 

Apply  Rule  1 : 

AlO 

Apply  Rule  3 : 

5'010 

Apply  Rules  4,  2 : 

50010 

Apply  Rule  5 : 

10010 
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The  reader  may  prove  as  an  exercise  that  any  string  of  O’s  and  I’s  in 
which  there  are  no  consecutive  I’s  can  be  derived  in  this  grammar  from 

S.  (Suggestion:  Use  mathematical  induction  on  the  length  of  a  string.) 

For  another  example,  let 

G2=(V^yT.P.S) 

Vt  =  {a,b,c} 

and  let  P  contain  the  following  rules. 

1.  S-^aSBC 

2.  S — 

3.  CB^BC 

4.  aB-^ab 

5.  bB-^bb 

6.  bC-^bc 

7.  cC — ^cc 

To  describe  the  language  generated  by  this  grammar  we  need  to 
introduce  some  new  notation:  If  a  is  a  string,  then  the  expression  a” 
refers  to  the  string  aa*  •  -  a,  in  which  a  is  repeated  exactly  n  times.  The 
language  L(G2)  then  contains  the  string  aVif  for  each  n  ^  1,  and  no 
other  strings. 

To  obtain  a  given  string  a'^bV,  we  work  in  the  following  fashion. 

Given:  S 

Apply  Rule  1  n  —  1  times: 

Apply  Rule  2:  a"{BCY 

Apply  Rule  3  as  often  d'B^C^ 

as  necessary^ : 

Apply  Rule  4:  a^bB'^'^'C 

Apply  Rule  5  n  —  1  times:  aVC'' 

Apply  Rule  6 :  a^b^cC^-^ 

Apply  Rule  7  n  —  1  times :  a”Z)  V” 

We  now  demonstrate  that  L(G2)  does  not  contain  any  strings  other 
than  those  of  the  form  aVc"":  first  of  all,  we  know  that  any  derivation  of 
a  string  must  start  from  the  symbol  S.  Note  that,  given  5,  we  cannot 
apply  rules  4,  5,  6,  or  7  until  we  apply  rule  2.  And,  once  rule  2  is  applied, 
we  can  no  longer  use  rules  1  or  2.  A  (nontrivial)  derivation,  then,  must 
start  with  the  use  of  rule  1  and  be  followed  by  a  series  of  applications 
of  rules  1  and  3  until  the  application  of  rule  2.  (We  could,  of  course, 

’’  “Necessary”  =  n(n  —  l)/2  times.  (Why?) 
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start  our  derivation  with  rule  2,  in  which  case  only  the  string  abc  can  be 
produced.)  The  string  now  consists  of  n  a’s  followed  by  some  ordering 
of  n  B’s  and  n  C’s.  After  applying  any  of  rules  3  through  7  any  num¬ 
ber  of  times,  the  string  will  still  have  the  form  a/3,  where  a  consists  entirely 
of  terminals  and  p  is  a  nonempty  string  consisting  entirely  of  5’s  and  C’s, 

Now,  if  all  the  ^’s  are  converted  to  h’s  before  any  C  is  converted  to 
a  c,  the  string  will  have  the  form  and  the  only  string  of  terminals 

that  can  possibly  be  derived  from  this  is  So,  we  may  assume  that 

some  C  is  converted  to  a  c  before  all  the  J5’s  are  converted  to  h’s;  the 
string  now  has  the  form  of  fl"hVa,  where  i  and  a  is  a  string  of  B*s 
and  C*s  (including  at  least  one  B).  The  only  rules  that  can  now  be  applied 
are  rules  3  and  7;  their  use  can  result  only  in  a  string  of  the  form  a 
(where  i  <  n  and  j  —  n),  such  that  a  contains  at  least  one  B.  They  are 
still  the  only  rules  that  can  be  applied,  and  their  use  continues  to  give  a 
string  of  the  same  form;  therefore  we  conclude  that  a  string  without 
variables  cannot  be  produced  if  a  C  is  converted  to  a  c  before  all  B's 
are  converted  to  b's.  Thus,  L(G2)  =  [aVc^'fn  ^1}. 

As  an  exercise,  the  reader  should  inspect  that  the  grammar 
Gs  ^  where  =  {S,A,B,C},Vt  ^  {a Ac},  and  P  con¬ 

tains  the  following  rules. 

1.  S->aAC 

2.  S — >aC 

3.  A—>aAB 

4.  A-^aB 

5.  C^bc 

6.  Bb-^bB 

7.  Bc^bcc 

This  grammar  also  generates  the  language  of  all  strings  of  the  form 
a'^bV,  n^l.  Grammars  that  generate  the  same  language  are  said  to  be 
equivalent. 

It  is  possible  to  distinguish  between  different  types  of  grammars 
on  the  basis  of  their  sets  of  production  rules.  The  reason  for  making 
the  distinction  is  that  there  exists  a  correspondence  between  each  type 
of  grammar  and  a  certain  type  of  machine. 

If  every  production  rule  in  a  grammar  is  of  the  form  A-^a  or 
A-^aB,  then  the  grammar  is  said  to  be  a  type  3,  or  rcgw/^r,  grammar, 
and  to  generate  a  type  3,  or  regular,  language.  (An  example  is  the 
grammar  Gi  above.)  If  every  production  in  a  grammar  is  of  the  form 
A-^a,  such  that  ^4  is  a  production  variable  and  a  G  V*,  then  the  gram¬ 
mar  is  called  a  type  2,  or  context-free,  grammar  (and  it  generates  a 
type  2  or  context-free  language).  Finally,  if  every  production  in  a  gram- 
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mar  is  of  the  form  such  that  the  number  of  symbols  in  the  string 
fB  is  always  greater  than  or  equal  to  the  number  of  symbols  in  the  string 
then  we  have  a  type  7,  or  context-sensitive,  grammar  (generating  a 
type  1,  or  context-sensitive,  language). 

The  reason  for  the  type  definition  “context-sensitive”  is  that  the 
class  of  languages  generated  can  be  shown  to  be  the  same  if  we  define 
instead  that  the  production  rules  in  a  context-sensitive  grammar  be  of 
the  form  <xAy-^<x/By,  where  and  a,  and  y  are  in  F*.  In 

other  words,  if  A  appears  “in  the  context  of”  a  and  y,  it  can  be  replaced 
by  fB.  Examples  of  context-sensitive  grammars  are  and  G3,  dis¬ 
cussed  above  (and  also  Gi;  every  type  3  language  (grammar)  is  also 
type  2;  every  type  2  language  (grammar)  is  also  type  1). 

If  no  restrictions  are  placed  on  the  form  of  the  production  rules  of 
a  grammar  (other  than  the  necessary  ones,  (x^V^  and  E*,  for  any 
rule  a^/?),  it  may  be  referred  to  as  a  type  0  or  “general”  phrase-struc¬ 
ture  grammar  (and  the  same  names  are  given  to  the  language  it  gen¬ 
erates  ) . 

Grammars,  Machines,  and  Extensibility 

The  basic  correspondence  between  grammars  and  machines  can 
be  fairly  simply  described,  making  use  of  the  concepts  of  “input  tape” 
and  “output  tape”  given  earlier.  We  say  a  machine  accepts  a  lan¬ 
guage  iff  whenever  any  sentence  oLthe  language  is  placed  on  the  (other¬ 
wise  blank)  input  tape  of  the  machine,  the  machine  eventually  prints  a 
“1”  on  its  (otherwise  blank)  output  tape  and  halts.  In  essence,  a  machine 
that  accepts  a  language  L  is  a  “procedural  embodiment”  of  a  grammar 
for  that  language.  It  can  be  shown  that  a  phrase-structure  language  is 
of  type  0  iff  there  is  a  Turing  machine  that  accepts  it.  (See  the  Exer¬ 
cises.)  Three  special  types  of  Turing  machine  can  be  defined — linear 
bounded  automata,  pushdown  automata,  and  finite-state  automata  (see 
Chapter  2) — and  it  can  be  shown  that  they  correspond  to  acceptors 
for  the  context-sensitive,  context-free,  and  regular  languages,  respec¬ 
tively. 

Of  course  it  is  desirable  to  do  more  than  merely  recognize  that  a 
given  sentence  belongs  to  a  language,  especially  if  we  are  concerned 
with  programming  languages.  It  is  also  necessary  to  “understand”  the 
sentence  itself,  and  implement  the  procedure  it  describes.  A  computer 
may  come  by  this  understanding  automatically,  just  as  the  universal 
Turing  machine  described  in  Chapter  2  would  be  automatically  able  to 
implement  the  procedure  described  by  a  sentence  in  its  “blank-one” 
language.  In  essence,  the  understanding  of  that  language  was  “wired 
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in”  with  the  next-move  function  of  the  machine.  In  general,  every  com¬ 
puter  will  be  able  to  understand  some  programming  language  in  this 
automatic  sense;  the  programming  language  that  is  “wired  in”  to  a  com¬ 
puter  is  commonly  known  as  its  machine  language. 

It  is  natural  to  ask  whether  a  computer  can  understand  program¬ 
ming  languages  other  than  its  machine  language.  The  answer  to  this 
question  is  yes.  Let’s  suppose  we  have  some  computer  U  which  has 
as  its  machine  language  the  programming  language  L,  and  that  we  want 
U  to  be  able  to  “understand”  sentences  in  another  programming  lan¬ 
guage,  U.  We  assume  that  U  is  at  least  type  0,  and  may  be  type  1,  2, 
etc.  (See  note  7-5.)  Because  U  is  universal,  and  because  U  is  type  0, 
we  know  that  we  can  write  a  program  (find  a  sentence  in  L)  that  de¬ 
scribes  a  procedure  which  will  accept  the  sentences  of  L'.  Also,  we 
know  that  U  will  be  capable  of  implementing  this  procedure.  Thus,  we 
can  “program”  U  to  accept  the  sentences  of  L'.  In  fact,  however,  it  is 
possible  to  do  more.  Given  a  description  of  a  grammar  G'  for  L'  and  a 
sentence  w'  in  L',  we  can  “program”  U  to  find  the  derivations  of  w' 
with  respect  to  the  grammar  G\  Normally  there  will  be  only  one  such 
derivation  and  it  will  provide  structural  information  about  the  procedure 
described  by  w'  that  can  be  used  to  construct  a  sentence  w,  in  the  lan¬ 
guage  Ly  which  describes  the  same  procedure.  (The  sentences  w  and  w' 
are  said  to  be  computationally  equivalent.)  Ont  can  attempt  to  describe 
the  procedure  by  which  the  sentence  >v  is  produced  from  the  sentence 
w'y  and  generalize  to  a  procedure  that  will  produce  a  computationally 
equivalent  sentence  in  L,  given  any  sentence  in  U.  If  this  general  pro¬ 
cedure  (called  a  translator  from  L'  to  L)  can  be  described  by  a  sentence 
p  in  L  (and  we  know  that  it  can  be,  if  L  is  a  universal  programming 
language),  then  p  can  be  used  to  extend  the  “language  capability”  of 
the  computer  U.  If  we  give  p  and  any  sentence  of  U  to  U,  that  sentence 
of  U  will  be  converted  into  a  sentence  of  L  and  the  procedure  it  de¬ 
scribes  can  then  be  implemented  by  U  (notes  7-6,  7-7)  . 

Thus,  it  is  possible  to  find  sentences  in  L  which  will  “extend”  the 
language  capability  of  I/,  just  as  it  is  possible  to  find  sentences  in 
English  that  a  person  can  use  to  extend  his  own  language  capability. 
When  one  looks  at  a  modern  computer  one  sees  a  hierarchy  of  languages 
Ly  L',  U'y  L"', .  .  .that  are  each  ultimately  embedded  in  the  machine 
language  of  that  computer.  We  can  now  make  use  of  many  different 
kinds  of  programs  (“compilers,”  “interpreters,”  etc.)  for  extending  a 
computer’s  language  capability  (see  Earley  and  Sturgis,  1970;  Irons, 
1970). 

Again,  the  reason  for  extending  the  language  capability  of  a  com¬ 
puter  is  not  exactly  that  the  computer  will  there%  be  able  to  do  things 
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it  could  not  do  before;  in  theory,  one  universal  programming  language 
is  just  as  good  as  another,  because  every  universal  programming  lan¬ 
guage  describes  the  same  class  of  procedures  (namely,  those  that  can 
be  carried  out  by  Turing  machines).  In  practice,  however,  we  find  that 
any  given  universal  programming  language  will  provide  very  simple 
sentences  describing  some  procedures  and  very  complex  sentences 
describing  other  procedures.  Thus,  a  “blank-one”  sentence  describing 
a  Turing  machine  procedure  for  matrix  multiplication  would  be  very 
long  and  difficult  for  a  person  to  handle.  The  same  procedure  may  be 
described  simply  in  other  universal  programming  languages  such  as 
FORTRAN,  ALGOL,  and  SAIL.  Since  people  want  to  be  able  to  describe 
procedures  like  matrix  multiplication  easily,  it  is  customary  to  extend 
the  computer’s  language  capability  to  include  these  “higher-level”  lan¬ 
guages  (note  7-8). 

In  addition  to  this  extensibility,  one  can  design  programming  lan¬ 
guages  to  facilitate  the  use  of  programs  with  “self-reference.”  Thus, 
“recursive  programs”  are  easily  described  in  lisp.  However,  no  one 
really  knows  the  precise  relationship  between  the  self-reference  of  recur¬ 
sive  programs  and  the  self-reference  of  natural-language  sentences. 
Also,  it  is  possible  to  design  programming  languages  with  “redundancy” 
(e.g.,  “error-coding”  of  instructions;  see  Lucky,  1969). 

Universal  programming  languages  have,  in  one  way  or  another, 
two  of  the  most  important  characteristics  that  are  possessed  by  human 
languages :  extensibility  and  self-reference.  These  characteristics  are 
not  possessed  in  any  form  by  any  other  known  organism-level  language. 
So,  it  may  not  be  so  surprising  to  read  in  the  next  section  that  com¬ 
puters  can  now  understand  human  languages  much  better  than  monkeys 
can. 

PROGRAMS  THAT  “UNDERSTAND” 

NATURAL  LANGUAGE 

Five  Problems 

In  the  preceding  section  we  saw  how  it  is  possible  to  extend  the 
“language  capability”  of  a  computer  so  it  can  understand  programming 
languages  other  than  its  machine  language.  To  make  such  an  extension, 
the  computer  might  be  given  a  sentence  (program)  in  some  program¬ 
ming  language  that  it  already  understands  (e.g.,  its  machine  language) 
that  will  enable  it  to  “translate”  sentences  in  the  new  language;  that  is, 
convert  them  into  sentences  it  can  already  understand.  Because  such 
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a  “translator”  sentence  (program)  describes  a  procedure  for  under¬ 
standing  the  new  language,  we  often  say  that  the  sentence  itself  “under¬ 
stands”  the  language,  even  though  in  fact  the  understanding  results 
from  the  interaction  of  the  translator  sentence  and  the  computer.  Thus, 
it  is  common  to  talk  of  programs  that  understand  languages. 

It  is  natural  to  ask  how  far  the  language  capabilities  of  computers 
can  be  extended.  For  example,  is  it  possible  to  write  programs  that 
understand  the  natural  languages  spoken  by  humans?  Can  we  extend 
the  language  capabilities  of  a  computer  to  the  extent  that  it  becomes 
possible  for  us  to  describe  in  English  the  procedures  we  wish  it  to  per¬ 
form?  Can  English  be  used  as  a  “programming  language”  for  a  com¬ 
puter?  If  this  could  be  done,  people  would  not  have  to  learn  special 
programming  languages  in  order  to  make  use  of  computers,  and  the 
utility  of  computers  might  be  greatly  increased. 

Artificial  intelligence  research  is  currently  concerned  with  prob¬ 
lems  such  as  how  to  program  computers  to  answer  questions  stated  in 
English  (or  other  natural  languages),  solve  problems  stated  in  English, 
and  participate  in  English  conversations  with  people  (or,  for  that  mat¬ 
ter,  other  computers ) .  Ultimately,  Ai  research  may  consider  a  variety  of 
more  difficult  problems,  such  as  whether  computers  can  translate  from 
one  natural  language  to  another  (note  7-9),  perform  complicated 
secretarial  work  (e.g.,  take  dictation),  or  play  “language  games”  (note 
7-10).  Although  the  emphasis  here  is  on  current  achievements  and 
problems,  it  is  well  to  keep  the  “more  difficult”  ones  in  mind  (cf.  Polya, 
1945).  The  evidence  presented  suggests  that  all  of  these  problems  may 
eventually  be  solved  (note  7-11). 

Before  continuing,  the  reader  should  note  that  this  discussion  will 
not  deal  with  the  machine  understanding  of  spoken  languages,  even 
though  reference  will  often  be  made  to  the  “speaker”  of  a  sentence, 
simply  to  follow  a  convention.  Techniques  for  enabling  computers  to 
hear,  understand,  and  make  spoken  words  and  sentences  are  still  in  a 
relatively  primitive  state  of  developnient.  The  reader  who  is  interested 
in  this  subject  should  refer  to  Astrahan  (1970)  ,  Bobrpw  (1968),  Denes 
and  Mathews  (1968),  D.  R.  Hill  (1967),  and  Mermelstein  (1969). 

Unless  otherwise  stated,  the  discussion  throughout  this  section 
will  always  be  concerned  with  computer  programs  that  “understand” 
English  sentences  (usually  submitted  via  a  computer  terminal),  which 
will  be  simply  called  “language  understanding  programs.”  In  the  sub¬ 
sequent  pages  a  variety  of  such  programs  will  be  discussed.  The  ap¬ 
proach  and  terminology  are  largely  modeled  after  that  of  Winograd’s 
(1971,  1972)  work,  which  the  reader  is  encouraged  to  consult.  Space 
does  not  permit  complete  descriptions  of  each  of  the  many  language- 
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understanding  programs  that  have  been  written,  so  instead  an  attempt 
is  made  to  summarize  the  most  important  approaches  that  have  been 
followed  in  their  design.  However,  special  attention  is  given  to 
Winograd’s  program,  one  of  the  most  linguistically  powerful. 

Each  language-understanding  program  may  typically  be  said  to 
confront  four  highly  interrelated  problems:  di  syntax  problem,  a  seman¬ 
tics  problem,  an  inference  problem,  and  a  generation  problem.  (The 
question  of  how  these  problems  are  interrelated,  and  of  what  use  a 
language  understanding  program  should  make  of  this  interrelation,  is 
itself  a  fifth  problem  faced  by  these  programs,  which  will  be  referred 
to  as  the  integration  problem.)  To  see  how  these  problems  arise,  re¬ 
turn  to  the  discussion  of  “understanding”  and  languages.  The  viewpoint 
presented  in  the  early  pages  of  this  chapter  was  that  one  “understands” 
a  sentence  in  a  language  by  making  a  model  of  its  “meaning,”  Mention 
was  made  of  three  aspects  of  a  sentence’s  transmission  which  should 
be  considered  as  elements  of  its  meaning:  the  things  that  cause  the 
sentence  to  be  transmitted  (e.g.,  the  speaker’s  “motives”  for  using  the 
sentence);  the  things  that  the  sentence  causes  when  it  is  transmitted 
(e.g.,  the  sentence  might  tend  to  have  an  “emotional  effect”  when  it  is 
used);  the  things  that  the  sentence  describes  (e.g.,  objects,  events,  con¬ 
cepts,  procedures,  other  sentences,  or  an  attitude  or  wish  held  by  its 
speaker) .  It  is  clear  that  with  this  interpretation  the  “meaning”  of  a 
sentence  is  highly  dependent  on  the  situation  in  which  it  is  used. 

However,  some  things  about  a  sentence  do  not  usually  depend  on 
the  situation,  or  “context”  of  its  use:  namely,  the  sequence  of  words  and 
letters  that  make  up  the  sentence  itself,  the  possible  derivations  (or 
“parsings”)  of  that  sentence  in  one’s  grammar  for  English,  and  the  set 
of  possible  “meanings”  of  the  words  in  the  sentence  (which  is  what 
makes  dictionaries  useful).  These  relatively  constant  attributes  (the 
latter  two  are  of  course  variable,  by  the  extensibility  of  natural  lan¬ 
guages)  of  the  sentence  help  us  determine  its  “meaning,”  if  we  also 
have  knowledge  about  the  “situation.” 

The  syntax  problem  has  two  basic  subproblems:  What  is  a  good 
grammar  for  English?  How  should  we  obtain  the  parsing(s)  of  a  sen¬ 
tence?  The  semantics  problem  is  that  of  finding  a  good  formalism  in 
which  to  express  models  for  “meanings”  and  “situations.”  The  inference 
problem  has  three  basic  subproblems:  How  can  we  use  our  model  for 
the  “situation”  and  the  constant  attributes  of  a  sentence  to  make  a  model 
of  its  meaning?  How  do  we  change  our  model  of  the  “situation”  when 
we  determine  the  “meaning”  of  a  sentence  we  receive?  How  do  we 
determine  the  “meanings”  that  we  wish  to  convey,  given  that  we  have 
determined  models  for  the  current  “situation”  and  the  “meanings”  of 
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the  sentences  we’ve  received?  The  generation  problem  is  that  of  finding 
and  transmitting  sentences  that  will  have  the  “meanings”  we  wish  to 
convey.  (The  semantics  and  inference  problems  are  often  jointly 
referred  to  as  the  representation  problem.) 

If  in  the  sentences  of  the  preceding  paragraph  we  substitute  “the 
computer”  for  “we,”  then  we  obtain  the  basic  problems  that  confront 
Ai  researchers  who  attempt  to  make  programs  that  understand  English. 
None  of  these  problems  has  yet  been  completely  solved,  if  we  make  a 
comparison  with  human  abilities  to  converse,  nor  is  this  surprising. 
Natural  languages  are  designed  to  be  useful  in  almost  the  full  range  of 
situations  that  people  encounter,  whereas  computers  currently  are 
acquainted  with  a  relatively  small  range  of  situations.  Success  in  achiev¬ 
ing  language-understanding  programs  is  limited  by  the  extent  to  which 
computers  can  be  enabled  to  reason  about  real-world  situations.  An  ex¬ 
ample  due  to  Schank(  1971  a,b)  is 

“We  saw  the  Grand  Canyon  flying  to  Chicago.” 

This  sentence  is  syntactically  ambiguous  (has  two  equally  plausible 
parsings)  unless  the  computer  knows  something  about  the  real-world 
nature  of  locations  and  the  ability  to  fly. 

Subsequent  pages  review  the  approaches  that  have  been  used  in 
designing  language-understanding  programs  that  can  solve  these  prob¬ 
lems.  A  brief  collection  of  conversations  with  computers,  to  illustrate 
the  success  ai  researchers  have  had  to  date,  will  then  be  presented.  The 
next  section  discusses  some  of  the  “open  questions”  that  still  remain, 
concerning  the  relevance  of  “semantic  information  processing”  to  artifi¬ 
cial  intelligence  in  general. 

Syntax 

The  earliest  language-understanding  programs,  which  were  writ¬ 
ten  for  the  purpose  of  mechanical  translation  (note  7-10),  were  de¬ 
veloped  before  linguists  had  achieved  any  very  precise  theories  of  syntax 
for  natural  languages.  Certainly  the  theories  that  then  existed  were  not 
precise  enough  to  suggest  explicitly  how  computers  Miould  be  pro¬ 
grammed  to  understand  natural  languages.  As  a  consequence,  the  de¬ 
signers  of  those  programs  were  forced  to  produce  their  own  ad  hoc 
systems  for  parsing  sentences  ([.&.,  parsers).  Because  they  lacked  a 
comprehensive  plan  for  designing  their  programs,  the  programs  tended 
to  become  more  and  more  complex,  difficult  to  understand  and  debug, 
and  difficult  to  improve;  therefore  the  programs  eventually  had  to  be 
abandoned,  during  the  latter  part  of  the  1950s.  After  that  time,  and 
until  1968,  designers  of  language-understanding  programs  followed 
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either  of  two  main  approaches  to  the  basic  problems  of  syntax:  the 
restricted  pattern-matching  approach  and  the  context-free  approach. 

The  restricted  pattern-matching  approach  consisted  essentially  of 
accepting  the  limitations  on  syntax  that  were  implied  by  the  lack  of  a 
good  formalism  for  expressing  and  using  grammars  for  English.  The 
researchers  who  followed  this  approach  (including  Bobrow,  Raphael, 
and  Weizenbaum)  recognized  that  the  syntax  problem  would  still  have 
to  be  solved  eventually  by  really  general  language-understanding  pro¬ 
grams,  but  they  managed  to  show  that  interesting  linguistic  behavior, 
relating  to  the  other  basic  problems  of  semantics  and  inference,  can  be 
obtained  even  if  only  minimal  solutions  to  the  syntax  problem  are  pro¬ 
vided.  The  language-understanding  programs  they  developed  did  not 
really  use  “grammars”  in  any  general  sense,  nor  did  they  parse  sen¬ 
tences.  Instead,  these  programs  were  designed  to  extract  semantic 
information  from  sentences  by  matching  them  against  any  of  a  small, 
prespecified,  constant  number  of  “templates”  or  “forms.”  Examples  of 
the  forms  used  by  Bobrow  (1968) — which  he  called  “linguistic 

forms” — are  “ _ and  “ _ equals _ ,”  “ _ ’s 

father,”  “salary  of _ ,”  “not _ “ _ gave _ _to _ ,” 

etc.  Bobrow’s  program  (known  as  student)  was  designed  to  follow  a 
relatively  rigid  procedure  of  successively  “filling  in  blanks”;  thus,  it 
might  “parse”  the  sentence,  “The  salary  of  John’s  father  equals  100 
dollars,”  by  filling  in  blanks  as  in  Fig.  7-4.  It  should  be  clear  that 


John 


Figure  7-4.  Pattern-matching  for  the  sentence  “The  salary  of  John’s 
father  equals  100  dollars.” 


STUDENT  had  a  “recursive”  ability  to  fill  in  blanks;  the  blanks  of  a  given 
template  might  be  filled  in  by  other  templates.  As  student  matched  a 
given  sentence  against  its  collection  of  linguistic  forms,  it  could  be 
guided  by  the  matchings  it  obtained  in  a  process  of  setting  up  an 
algebraic  equation  to  represent  the  relationship  between  the  “variables” 
of  the  sentence  (e.g.,  “John”).  Thus,  student  was  capable  of  convert- 
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ing  each  of  a  limited  (although  infinite)  variety  of  English  sentences  into 
an  equivalent  algebraic  equation.  Given  a  collection  of  such  sentences, 
STUDENT  would  form  a  set  of  simultaneous  equations,  student  was 

designed  to  use  special  linguistic  forms  like  “find _ ”  to  identify  the 

variables  for  which  it  should  solve,  given  such  a  set  of  simultaneous 
equations,  and  it  contained  a  special  set  of  programs  it  could  then  use 
to  actually  solve  sets  of  simultaneous,  elementary  algebra  equations. 
Finally,  student  was  capable  of  “assuming”  certain  variables  to  be 
equivalent  (based  on  simple  structural  similarities  between  the  ways 
they  were  named  in  the  initial  set  of  English  sentences)  if  the  operation 
of  its  problem-solving  routines  revealed  that  it  had  been  given  more 
variables  than  equations.  If  this  technique  failed,  student  could  ask 
the  person  who  supplied  the  problem  for  more  information.  Thus, 
STUDENT  was  capable  of  performing  a  fairly  difficult  intellectual  task, 
that  of  understanding  and  solving  algebra  word-problems. 

It  should  be  evident  from  this  description  that  student’s  ability 
to  “understand”  algebra  problems  that  were  stated  in  English  was 
somewhat  limited.  One  could  easily  find  problem  statements  that  it 
could  not  understand,  using  its  restricted  pattern-matching  approach  to 
syntax.  Still,  student  and  the  other  early  programs  that  used  this  ap¬ 
proach  demonstrated  some  rather  impressive  (and  surprising)  behavior. 
student  fostered  two  other  special-purpose  question-answering  pro¬ 
grams,  carps  and  happiness  (Charniak,  1969;  and  Gelb,  1971),  re¬ 
spectively  designed  to  solve  calculus  and  probability  problems  stated  in 
English. 

The  context-free  approach  to  the  problem  of  syntax  involved  find¬ 
ing  simplified  subsets  of  English  that  could  be  described  by  well-under¬ 
stood  kinds  of  phrase-structure  string  grammars;  much  research  con¬ 
centrated  specifically  on  the  use  of  context-free  grammars,  owing  to 
their  proven  value  as  the  basis  for  ordinary  programming  languages. 
However,  the  full  complexity  of  English  syntax  is  not  easily  describable 
by  phrase-structure  grammars  (see  Winograd,  1971,  1972,  for  a  discus¬ 
sion  of  reasons).  Thus,  the  context-free  approach  has  had  only  limited 
success.  Rather  than  discuss  this  approach  in  any  detail,  the  reader  is 
asked  to  refer  to  Simmons  (1965)  and  Kuno  (1965). 

Recursive  Approaches  to  Syntax 

In  many  ways,  the  year  1968  was  a  good  one  for  language-under¬ 
standing  programs.  From  the  standpoint  of  syntax,  it  was  the  year  in 
which  the  designers  of  these  programs  freed  themselves  from  the  restric¬ 
tiveness  of  phrase-structure  grammars  by  taking  a  new  approach  to 
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syntax.  The  essence  of  this  approach  to  syntax  is  the  realization  that 
language-understanding  programs  need  not  be  restricted  to  the  use  of 
phrase-structure  grammars  any  more  than  computers  need  be  restricted 
to  the  simulation  of  Turing  machines.  Phrase-structure  grammars  and 
Turing  machines  are  adequate  simple  formalizations  for  the  infinite 
classes  of  all  machine-understandable  languages  and  all  machine- 
computable  functions,  but  they  are  extremely  poor  formalizations  in 
which  to  describe  the  relatively  small  classes  of  natural  languages  and 
intelligent  procedures.®  The  result  of  using  this  new  approach  has  been 
the  discovery  that  natural  language  grammars  can  be  profitably  de¬ 
scribed  as  certain  kinds  of  recursive  procedures.  Two  ways  of  describing 
these  procedures  have  been  developed,  corresponding  to  the  formaliza¬ 
tion  of  augmented  state  transition  networks  (see  Thorne,  Bratley,  and 
Dewar,  1968;  Bobrow  and  Fraser,  1969;  Woods,  1969)  and  to  the 
programming  language  programmar  (Winograd,  1971). 
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Figure  7-5.  A  simple  augmented  transition  network  grammar. 

(Kaplan,  1971,  reprinted  with  permission.) 

Augmented  transition  networks  are  a  generalization  of  the  transi¬ 
tion  networks  for  the  finite-state  machines  discussed  in  Chapter  2. 
Figure  7—5  shows  an  example  of  a  simple  augmented  transition  net¬ 
work;  as  can  be  seen,  it  is  basically  a  graphlike  structure  similar  to  the 
transition  networks  discussed  previously.  However,  two  important 
changes  should  be  noted:  First,  the  augmented  transition  network 

®  Actually,  these  classes  are  not  really  so  “small”;  perhaps  it  is  better  to  say 
that  they  have  a  very  low  “density”  when  one  tries  to  find  them  by  searching 
through  the  classes  of  all  languages  and  functions,  as  represented  by  phrase- 
structure  grammars  and  Turing  machines. 
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represents  a  recursive  procedure.  This  is  achieved  by  allowing  the  label 
of  an  arc  to  refer  to  a  state,  using  either  a  push  or  a  pop  command.  A 
PUSH  command  means  that  the  transition  along  its  arc  should  be  post¬ 
poned  (i.e.,  the  name  of  the  state  at  the  head  of  the  arc  should  be 
placed  on  the  top  of  a  “pushdown”  storage  unit)  and  the  new  state  of 
the  machine  should  instead  become  that  referred  to  (explicitly)  by  the 
PUSH  command.  Thus,  arc  1  in  Fig.  7-5  has  the  label  push  np/,  which 
specifies  that  the  machine  should  postpone  its  transition  from  state  s/ 
to  state  s/suBJ  and  instead  make  a  transition  from  state  s/  to  np/. 
Similarly,  a  pop  command  may  be  a  label  for  a  “dangling  arc,”  the 
head  of  which  is  not  attached  to  a  node.  The  meaning  of  a  pop  com¬ 
mand  is  that  the  machine  should  remove  the  name  at  the  top  of  its 
pushdown  store  and  make  a  transition  to  the  corresponding  state.  Thus, 
suppose  that  the  pushdown  store  should  happen  to  contain  the  follow¬ 
ing  “stack”  of  names : 

NP/DET 

S/SUBJ 

NP/N 

S/ 

and  suppose  that  the  machine  should  happen  to  be  in  state  s/vp;  then 
the  POP  command  at  arc  5  will  specify  that  the  machine  should  make  a 
transition  from  state  s/vp  to  state  np/det  and  that  the  stack  of  names 
in  the  pushdown  store  should  become 

S/SUBJ 

NP/N 

S/ 

Besides  this  ability  to  “transfer  control  recursively”  throughout 
the  network,  augmented  transition  networks  differ  from  finite-state 
transition  networks  in  another  manner:  Each  arc  may  be  allowed  to 
specify  a  condition  and  2i  sequence  of  actions;  the  actions  that  an  arc 
may  specify  are  those  of  building  and  naming  tree  structures — ^the  name 
of  a  tree  structure  is  known  as  its  register,  and  registers  are  said  to 
“contain”  their  tree  structures.  Actions  may  specify  various  kinds  of 
changes  to  the  contents  of  registers  “in  terms  of  the  current  input  sym¬ 
bol,  the  previous  contents  of  registers,  and  the  results  of  lower-level 
computations  (pushes)”  (Kaplan,  1971).  The  “input  symbols”  that 
are  submitted  to  an  augmented  transition  network  are  English  words. 
The  network  of  Fig.  7-5  is  designed  to  start  in  state  s/,  with  the  first 
word  of  a  sentence  being  submitted  to  it;  arcs  that  do  not  have  push  or 
POP  commands  attached  may  have  either  “word”  or  “category”  state- 
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merits  attached  to  them.  Thus,  arc  2  has  the  label  cat  v,  which  means 
that  if  the  machine  is  in  state  s/subj  and  the  input  symbol  is  a  word 
that  is  a  verb,  then  the  machine  is  to  make  a  transition  to  state  vp/v, 
provided  the  additional  condition  for  arc  2  listed  in  the  table  of  “arcs, 
conditions,  and  actions”  below  the  diagram  is  satisfied.  The  condition 
for  an  arc  is  in  general  a  Boolean  combination  of  predicates  involving 
the  current  input  symbol  and  register  contents;  the  conditions  for  some 
arcs  (e.g.,  arc  7)  may  always  be  trivially  satisfied.  A  sentence  is  said 
to  be  accepted  by  an  augmented  transition  network  whenever  a  final 
state  (i.e.,  a  “dangling  arc”),  the  end  of  the  sentence,  and  an  empty 
pushdown  store  are  all  reached  at  the  same  time.  The  parsing  for  a 
sentence  provided  by  an  augmented  transition  network  corresponds 
simply  to  the  history  of  transitions,  pushes,  and  pops  required  to  ac¬ 
cept  it. 

The  particular  grammar  shown  in  Fig.  7-5  will  not  be  discussed 
in  any  greater  detail.  However,  the  reader  interested  in  these  networks 
may  wish  to  see  if  he  can  understand  the  operation  of  this  example 
network  on  a  simple  sentence,  shown  in  Fig.  7-6.  A  complete  explana¬ 
tion  of  this  example  is  to  be  found  in  Kaplan  (1971).® 

A  few  sentences  are  sufficient  to  describe  the  nature  of  the  gram¬ 
mars  that  can  be  formulated  as  augmented  state  transition  networks, 
and  to  indicate  their  applicability  to  the  syntax  of  natural  languages. 
The  use  of  push  and  pop  commands,  and  conditions,  actions,  and 
registers  in  such  a  grammar  (network),  enables  it  to  try  out  different 
kinds  of  parsing  strategies  on  variably  large  phrases  in  a  sentence,  to 
store  information  relating  to  the  success  of  these  strategies  as  they  are 
being  carried  out,  and  to  recognize  v/henever  a  given  strategy  has  failed 
so  that  a  new  strategy  can  be  tried.  If  this  is  contrasted  with  the  per¬ 
formance  offered  by  context-free  grammars,  the  differences  are  striking. 
A  parser  that  uses  a  phrase-structure  grammar  typically  has  a  large  set 
of  production  rules,  each  of  which  is  potentially  applicable  at  any  point 
in  its  analysis.  Such  a  parser  is  not  easily  made  to  simulate  strategic 
performance  in  the  way  it  couducts  its  analysis.  Even  though  a  system 
designer  were  to  manage  somehow  to  find  a  parser  and  a  phrase-struc¬ 
ture  grammar  that  would  efficiently  parse  the  sentences  of  a  given  subset 
of  English,  he  would  in  general  find  it  difficult  to  extend  his  system  to 

^  A  helpful  hint:  The  symbol  represents  a  special  register  in  Kaplan’s 
formalism  which  always  contains  the  structure  or  word  that  “enabled”  the  most 
recent  transition  of  the  machine.  In  most  cases  this  is  an  input  symbol;  however, 
whenever  a  pop  command  is  executed,  it  is  the  value  of  that  command’s  argu¬ 
ment.  Thus,  the  execution  of  pop(sbuild)  causes  the  value  of  the  function 
SBUILD  to  be  placed  in  Also,  the  jump  label  on  an  arc  indicates  that  a  tran¬ 
sition  is  to  be  made  without  advancing  the  input  sentence. 


Sentence ;  The  man  kicked  the  ball. 

STRING  -  (THE  KAN  KICKED  THE  BALL) 
ENTERING  STATE  S/ 

ABOUT  TO  PUSH 
ENTERING  STATE  NP/ 

TAKING  CAT  DET  ARC 

STRING  -  (MAN  KICKED  THE  BALL) 
ENTERING  STATE  NP/DET 
TAKING  CAT  N  ARC 

STRING  -  (KICKED  THE  BALL) 

ENTERING  STATE  NP/N 
ABOUT  TO  POP 
ENTERING  STATE  S/SUBJ 
TAKING  CAT  V  ARC 

STRING  -  (THE  BALL) 

ENTERING  STATE  VP/V 
STORING  ALTARC  ALTERNATIVE  76869^ 
ABOUT  TO  PUSH 
ENTERING  STATE  NP/ 

TAKING  CAT  DET  ARC 

STRING  «  (BALL) 

ENTERING  STATE  NP/DET 
TAKING  CAT  N  ARC 

STRING  -  NIL 
ENTERING  STATE  NP/N 
ABOUT  TO  POP 
ENTERING  STATE  S/VP 
ABOUT  TO  POP 
SUCCESS 

10  ARCS  ATTEMPTED 

195  consesh 

1.8869999  SECONDS^^ 

PARSINGS: 

S  NP  DET  THE 
N  MAN 

AUX  TNS  PAST 
VP  V  KICK 

NP  DET  THE 
N  BALL 


a.  The  alternative  analysis  path 
starting  with  arc  h  is  saved. 

b.  Number  of  memory  words  used. 

c.  Processing  time  required. 


Figure  7--6.  Trace  of  an  analysis.  (Kaplan,  1971,  reprinted  with  per¬ 
mission.) 
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a  larger  subset  of  English  and  still  maintain  its  efficiency.  By  compari¬ 
son,  it  is  relatively  easy  to  add  new  strategic  abilities  to  a  network 
grammar. 

Another  formalization  of  this  approach  to  syntax  is  the  program¬ 
ming  language  programmar  (Winograd,  1971,  1972).  programmar 
facilitates  the  writing  of  programs  that  can  act  as  grammars  and  parsers 
for  natural  languages.  It  is  specifically  designed  to  facilitate  the  descrip¬ 
tion  of  parsers  that  can  act  strategically  and  recursively,  and  to  enable 
the  designer  of  a  language-understanding  program  to  make  extensions 
to  his  system  in  a  fairly  straightforward  fashion.  It  is  closely  related  to 
the  language  planner  in  the  general  philosophy  of  the  programs  it  is 
intended  to  encourage,  but  the  theory  underlying  its  orientation  to 
natural  language  is  actually  that  of  systemic  grammar,  an  outlook  on 
natural  language  that  has  been  developed  by  Halliday  (1961  et  seq.). 
Space  does  not  permit  a  detailed  discussion  of  programmar  and  the 
theory  of  systemic  grammar.  However,  an  outline  of  the  highlights 
these  topics  exhibit  is  provided  below.  The  information  given  should 
be  sufficient  for  the  general  reader  to  decide  whether  to  investigate 
them  further.  For  a  more  detailed  introduction,  Winograd’s  discussion 
of  these  subjects  is  readily  understandable  to  the  nonspecialist. 

Some  of  the  basic  tenets  of  systemic  grammar,  as  expressed  pre¬ 
viously,  are  repeated  as  follows: 

1.  The  purpose  of  natural  language  is  communication;  thus, 
the  syntactic  nature  of  language  must  be  understood  in  rela¬ 
tion  to  the  semantic  information  it  is  designed  to  carry. 

2.  The  problems  of  syntax,  semantics,  inference,  and  generation, 
which  are.  to  be  solved  in  the  use  of  natural  language,  are 
all  closely  interrelated;  it  is  desirable  that  a  language¬ 
understanding  program  be  able  to  solve  these  problems  in 
a  highly  integrated  way. 

3.  Despite  their  interrelations,  these  problems  are  in  many  ways 
quite  distinct.  Thus,  we  should  not  expect  that  a  system 
designed  to  solve  the  generation  problem  (e.g.,  transforma¬ 
tional  grammar;  see  Chomsky,  1959  et  seq.)  will  necessarily 
be  the  basis  of  an  efficient  system  to  solve  the  syntax  (in 
particular,  the  parsing)  problem. 

Conditions  (2)  and  (3)  above  will  be  considered  more  thoroughly 
in  the  following  subsections,  devoted  specifically  to  the  semantics, 
inference,  generation,  and  integration  problems.  To  be  considered  first 
is  condition  (1),  the  way  in  which  systemic  grammar  and  programmar 
are  designed  to  understand  the  syntactic  nature  of  English  in  terms  of 
the  semantic  information  its  sentences  may  carry. 
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A  viewpoint  common  to  the  theories  of  both  systemic  and  trans¬ 
formational  grammar  is  that  the  structure  of  a  sentence  is  the  result  of 
a  sequence  of  grammatical  choices  made  by  the  speaker  of  the  sen¬ 
tence.  Systemic  grammar  describes  a  specific  class  of  such  grammatical 
choices  and  specifies  the  effect  that  each  will  have  on  the  nature  of  the 
sentence  being  produced.  Moreover,  systemic  grammar  dscribes  cer¬ 
tain  relationships  that  exist  among  these  grammatical  choices,  and 
specifies  by  means  of  these  relationships  which  sequences  of  gram¬ 
matical  choices  may  produce  “meaningful”  sentences  and  which  may 
not.  When  a  person  makes  a  meaningful  sequence  of  grammatical 
choices  in  the  course  of  producing  a  sentence,  the  effect  of  the  choices 
he  makes  will  be  to  provide  the  sentence  with  certain  structural  char¬ 
acteristics,  or  features,  which  other  people  can  use  as  an  aid  td  the  dis¬ 
covery  of  the  “meaning”  of  the  sentence. 

For  example,  every  sentence  must  have  structural  characteristics 
corresponding  to  exactly  one  of  the  three  features:  imperative,  de¬ 
clarative,  or  QUESTION.  Thus,  the  speaker  must  make  the  grammatical 
choice  as  to  which  of  these  three  features  he  wants  his  sentence  to  have. 
Again,  if  the  speaker  should  choose  to  give  his  sentence  the  structural 
characteristics  corresponding  to  question,  systemic  grammar  specifies 
that  he  will  also  have  to  make  a  choice  between  the  structural  char¬ 
acteristics  corresponding  to  the  features  yes-no  and  WH-question.^®  The 
features  possessed  by  his  sentence  are,  in  effect,  markers  that  people 
may  use  to  “understand”  that  it  is  a  question  and  that  it  requires,  say, 
a  yes  or  no  answer.  A  set  of  features  that  form  a  mutually  exclusive 
set  (e.g.,  yes-no  and  WH-question)  are  said  to  be  a  system.  The  set  of 
other  features  that  must  be  present  for  the  grammatical  choice  between 
the  elements  of  a  system  to  be  possible  is  known  as  the  condition 
for  that  system.  Thus,  the  entry  condition  for  the  system  yes-no, 
WH-question  is  the  feature  question. 

In  addition  to  sentences,  the  theory  of  systemic  grammar  specifies 
features  (and  systems  and  entry  conditions)  for  smaller  “syntactic 
units”  such  as  noun  groups,  prepositional  groups,  and  words.  (Thus, 
the  various  endings  that  a  word  might  have  are  considered  to  be  the 
“features”  it  may  possess;  the  word  itself  may  be  the  entry  condition 
for  the  system  of  its  endings.) 

PROGRAMMAR  is  designed  to  facilitate  the  writing  of  programs 
capable  of  implementing  systemic  grammars.  The  language-understand- 

A  sentence  that  has  the  structural  characteristics  corresponding  to  the 
feature  “WH-question”  must  possess  the  feature  question  and  must  begin  with 
one  of  the  words  “what,”  “why,”  “who,”  “where,”  “how,”  “which,”  etc. 

^^More  generally,  an  entry  condition  niay  be  a  Boolean  formula,  the  terms 
of  which  are  features. 
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ing  program  that  Winograd  has  written  using  programmar  contains 
many  special  subprograms,  each  designed  to  recognize  a  different  feature 
that  can  be  possessed  by  certain  strings  of  words.  In  addition,  his  pro¬ 
gram  is  capable  of  recognizing  the  presence  of  entry  conditions  and 
can  call  the  appropriate  feature-detecting  programs  when  necessary. 
This  gives  it  the  basic  ability  to  carry  out  a  systemic  analysis  of  the 
features  possessed  by  an  English  sentence.  Other  parts  of  his  language¬ 
understanding  program  are  capable  of  using  the  systemic  analysis  of  a 
sentence  to  construct  a  “semantic  model”  of  its  meaning,  to  integrate 
this  semantic  model  into  its  current  “world  model,”  to  prove  theorems 
and  solve  problems  about  and  within  the  world  model  (using  planner), 
to  use  the  semantic  model  to  detect  errors  in  the  systemic  analysis  of  a 
sentence  (or  part  of  a  sentence)  and  redirect  the  analysis  in  a  strategic 
manner  toward  a  more  plausible  parsing,  and  to  generate  appropriate 
replies  (e.g.,  answer  questions)  to  sentences  submitted  by  people. 
Winograd’s  program  is  capable  of  answering  questions  about  itself 
(its  world  model  contains  a  simple  “self-model”)  and  of  reiuembering 
and  understanding  the  contexts  of  conversations.  A  sample  conversation 
with  this  program  (called  “shrdlu”)  is  given  at  the  end  of  this  section. 

Semantics  and  Inference 

“Meaning”  and  “semantic  information”  are  half-mysterious  con¬ 
cepts.  By  this  is  meant  that  people  are  unable  to  know  precisely  what 
effect  their  words  may  have  on  people,  whereas  they  can  know  exactly 
what  effect  their  words  may  have  on  machines. 

Thus,  in  the  preceding  pages  no  attempt  was  made  to  present 
very  concrete  definitions  of  the  meaning,  or  semantic  information,  that 
may  be  conveyed  by  the  sentences  of  a  natural  language.  To  have  done 
so  would  have  been  to  discuss  a  theory  of  human  psychology  (which 
causes  the  sentence  to  be  used;  which  is  partially  caused  by  the  use  of 
sentences);  such  a  discussion  would  eventually  be  desirable,  but  it  is 
not  necessary  here.  In  these  pages  the  primary  concern  is  with  viewing 
the  (relatively)  unmysterious  behavior  of  machines — unmysterious  be¬ 
cause  we  can  look  directly  at  their  inner  workings  and  at  the  data  stored 
in  their  memories. 

Because  we  can  know  and  design  the  “psychology”  of  language¬ 
understanding  machines,  the  notions  of  meaning  and  semantic  informa^ 
tion  become  “halfway  more  tractable.”  We  can  define  these  concepts 
rigorously  for  a  language-understanding  machine  if  it  has  been  built  (or 
programmed  or  designed),  but  we  have  difficulty  in  defining  these  con¬ 
cepts  for  the  ultimate,  truly  intelligent  machine  that  would  understand 
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our  language  as  well  as  we  do,  since  we  know  as  little  about  the  internal 
workings  of  that  (nonexistent)  machine  as  we  know  about  the  internal 
workings  of  our  own  intelligence. 

For  a  language-understanding  program,  the  “semantic  informa¬ 
tion”  carried  by  a  sentence  is  simply  the  data  structure  that  the  program 
creates  when  it  processes  (“understands”)  the  sentence.  The  problem 
of  semantics  is  to  discover  what  kinds  of  “semantic”  data  structures 
such  programs  should  create  in  order  to  provide  the  best  solutions  to 
the  problems  of  inference  and  generation.  The  problems  of  inference 
and  generation  are  to  discover  how  language-understanding  programs 
should  use  their  semantic  data  structures  to  produce  the  kind  of  be¬ 
havior  that  people  would  accept  as  evidence  that  the  programs  “under¬ 
stand”  language.  So  far,  the  greatest  success  in  solving  the  problems  of 
semantics,  inference,  and  generation  has  been  in  enabling  machines  to 
understand  the  relatively  factual,  logical,  nonpsychological  aspects  of 
its  use.  However,  some  investigators  (notably  Colby,  Schank,  Tesler, 
Enea,  Abelson  and  Carroll)  have  been  concerned  with  developing 
programs  with  an  aptitude  for  understanding  the  emotional,  metaphori¬ 
cal,  and  otherwise  psychological  aspects  of  meaning. 

Clearly,  the  problem  of  semantics  can  be  minimized  by  restricting 
the  “environment”  or  “problem  domain”  that  one’s  language-under- 
standing  program  is  supposed  to  “understand”;  some  of  the  earliest 
language-understanding  programs  (e.g.,  sad-sam;  see  R.  K.  Lindsay, 
1963)  did  exactly  this.  They  minimized  their  problem  of  semantics  by 
severely  restricting  the  type  of  questions  they  could  accept,  informa¬ 
tion  they  could  store,  and  problems  they  could  solve.  To  a  lesser  extent, 
the  more  recent  “specialized  question  answerers”  (e.g.,  sxuDENt, 
CARPS,  happiness;  see  Bobrow,  1968,  Charniak,  1969,  Gelb,  1971  a,b) 
have  adopted  the  same  policy. 

Several  approaches,  which  niay  ultimately  be  developed  into  a 
workable,  general  semantics-inference  formalism,  have  been  suggested. 
These  may  be  grouped  into  two  classes  (which  are,  however,  somewhat 
indistinct)  :  the  predicate  calculus  formalism  and  the  graph-structure 
formalisih.  The  predicate  calculus  formalism  was  investigated  by  Coles 
(1969)  and  C.  C.  Green  (1969),  who  showed  that  it  is  possible  to 
translate  relatively  simple  natural  language  questions  into  example- 
construction  problems  that  can  be  posed  in  first-order  predicate  calculus 
and  solved  using  the  resolution  technique  (see  Chapter  6).  This  ap¬ 
proach  seems  plausible  because  of  ( 1 )  the  generality  of  first-order  predi¬ 
cate  calculus  as  a  language  for  the  statement  of  facts  and  problems  and 
(2)  the  completeness  (and  consequent  problem-solving  generality)  of 
the  resolution  procedure. 
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As  previous  chapters  have  shown,  graphs  are  a  type  of  “mathe¬ 
matical  construct”  that  Ai  researchers  find  useful  in  describing  much  of 
their  work.  For  example,  state-space  problems,  finite-state  automata, 
augmented  transition  networks,  and  “flowcharts”  for  computer  pro¬ 
grams  (see  Chapin,  1971b;  Rodriguez,  1969)  may  all  be  represented 
by  structures  that  are  essentially  graphs.  Winston’s  work  (1970),  dis¬ 
cussed  in  Chapter  5,  showed  that  graph  structures  can  be  used  to  de¬ 
scribe  visual  patterns.  In  general,  anything  that  has  a  “structural  nature” 
and  can  be  described  as  a  collection  of  parts  existing  in  various  rela¬ 
tionships  to  each  other  may  be  represented  by  a  graph.  In  particular, 
our  examples  show  that  graphs  may  be  profitably  used  to  describe 
certain  types  of  problems,  “processes”  (i.e.,  automata),  and  patterns. 
As  an  approach  to  the  semantic-inference  problems  in  language  under¬ 
standing,  the  graph-structure  formalism  consists  of  attempts  to  model 
the  “meaning”  of  sentences  and  words  by  graphs.  The  plausibility  of 
this  approach  is  supported  by  the  fact  that  real-world  situations,  which 
may  be  described  by  sentences  (and  which  may  help  cause  the  use  of 
sentences,  or  be  partially  caused  by  the  use  of  sentences),  often  have  a 
structural  nature.  The  best  way  of  describing  the  structural  nature  of 
general,  real-world  situations  is  still  not  known.  However,  the  utility  of 
the  graph-structure  formalism  should  be  apparent  if  we  simply  note  a 
few  examples  of  its  use. 

One  of  the  earliest  studies  of  the  graph-structure  formalism  was 
conducted  by  Quillian  (1966),  who  developed  an  elegant  model  of 
semantic  memory.  Information  is  represented  in  this  semantic  memory 
by  a  graph  structure  of  arbitrary  size  in  which  each  node  is  named  by  a 
word  and  the  arcs  between  nodes  represent  certain  specific  relationships, 
or  associative  links,  that  may  exist  between  words.  Nodes  are  of  two 
kinds:  types  and  tokens.  A  type  node  represents  the  “meaning”  of  its 
name  word;  the  associative  links  going  from  a  type  node  lead  to  a 
configuration,  or  plane,  of  token  nodes  that  represents  a  definition  of 
this  “meaning”;  the  only  purpose  of  token  nodes  is  to  be  used  in  such 
definitions.  Thus,  a  token  node  represents  a  “use”  of  its  name  word. 
Two  additional  constraints  are  imposed:  For  any  given  token  node 
there  must  be  exactly  one  type  node  bearing  the  same  name  word,  and 
the  two  nodes  are  to  be  connected  by  a  special  “token-to-type”  associa¬ 
tive  link.  For  each  meaning  of  an  English  word  there  must  be  exactly  one 
type  node;  a  word  like  plant,  which  has  multiple  meanings,  is  repre¬ 
sented  by  multiple  type  nodes  plant,  plant  1,  plant2,  etc.  In  dia¬ 
grams,  type  nodes  are  circled,,  whereas  token  nodes  are  simply  indicated 
by  the  presence  of  their  name  words.  Figure  7-7  shows  the  different 


Semantic  information  processing 


307 


kinds  of  associative  links  used  in  Quillian’s  semantic  memory  model. 
Figure  7—8A  shows  some  planes  stored  in  a  semantic  memory  represent¬ 
ing  definitions  for  plant;  Fig.  7-8B  represents  food.  Quillian  wrote  a 
program  that  could  do  “associative"’  processing  on  this  kind  of  memory 
and  demonstrated  that  it  could  “compare  concepts”  and  discover  inter¬ 
relationships  not  indicated  specifically  in  its  “definition  planes.”  Es¬ 
sentially,  the  mechanism  for  comparing  two  concepts  was  a  breadth- 
first,  bidirectional  search  through  the  graph  structure  of  the  memory 
(see  Fig.  7-9).  The  next  section  presents  some  computer-produced 
concept  comparisons.  (Quillian’s  paper  presented  intriguing  discussions 
on  the  similarities  of  this  model  to  human  concept  comparison  and  on 
the  difficulties  of  making  dictionaries.) 

Among  the  more  recent  graph-structure  formalisms  for  semantic 
information  storage  are  Schank’s  (1970  et  seq.)  “conceptual  dependency 
graphs”  (see  Fig.  7-10),  Shapiro’s  (1971a, b)  mens  system  (see  Fig. 
7-11),  and  the  “hierarchial  graphs”  of  Pratt  (1969  et  seq.). 

As  mentioned  before,  the  graph-structure  formalism  and  the  predi¬ 
cate  calculus  formalism  are  somewhat  indistinct.  This  is  true  because 
predicate  calculus  expressions  (of  any  order)  may  be  stored  in  graph 
structures  and  because  predicate  calculus  expressions  may  describe 
properties  of  graph  structures.  One  of  the  early  language-understanding 
programs  that  stored  predicate  calculus  expressions  in  graph  structures 
was  SIR,  written  by  Raphael  in  1964  (also  see  Simmons  and  Bruce, 
1971). 

An  important  relationship  between  the  problems  of  semantics  and 
inference  is  described  by  the  principle  of  homogeneity:  The  operations 
used  to  process  semantic  information  should  themselves  be  describable 
as  semantic  information  and  stored  in  a  common  semantic  memory  with 
other  information.  This  principle  dates  back  to  the  “stored  program” 
concept  formulated  in  the  early  years  of  computer  science,  but  it  has 
often  been  rediscovered  by  the  designers  of  language-understanding 
systems.  Among  the  studies  following  this  principle  are  those  presented 
by  Quillian  (1969),  Shapiro  (1971),  Hewitt  (1968  et  seq.),  Norman 
(1972),  Sussman  (1972),  and  R.  C.  Moore  (1973).  Winston’s  (1970) 
structure-recognizing  program,  discussed  in  Chapter  5,  should  also  be 
mentioned  in  this  regard. 

As  for  SHRDLU,  it  incorporates  the  graph-structure  formalism,  the 
predicate  calculus  formalism,  and  the  principle  of  homogeneity.  The 
features  of  sentences  detected  by  its  systemic  parser  can  be  translated 
readily  into  conjunctions,  disjunctions,  conditionals  (“if  . . .  then  . . . 
else”  statements),  etc.,  in  the  planner  formalism.  The  evaluation  of  a 
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Associztive  Link  ( type-to-token,  and  toicen-to-token,  used  w»thjn  a  plane) 


(  only  where  A  Is  a  type  node  )  B  names 
a  class  of  which  A  Is  a  subclass. 


(  only  where  A  Is  a  token  node  )  B  modifies  A. 


A,B,  and  C  form  a  disjunctive  set. 


A,  B,  and  C  form  a  conjunctive  set. 


8,  a  subject.  Is  related  to  C,  an 
object,  in  the  manner  specified  by 
A,  the  relation.  Either  the  link 
to  B  or  to  C  may  be  omitted  in  a 
plane,  which  implies  that  A's  normal 
subject  Of  object  is  to  be  assumed. 


Associative  Link  (  token-to-type,  used  only  between  planes  ) 

I  I  ^  I  C  A,  B,  and  ( 


A, B,  and  C  are  token  nodes, 
for,  respectively.  A,  B,  and  C. 


®  © 


Figure  7-7.  Associative  iinks.  (Quiiiian,  1966,  reprinted  with  permis- 
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FOOD:  I.  That  which  living  being  has  to  take  in  to  keep  it  living  and  for  growth. 

Things  forming  meals,  especially  other  than  drink 


Figure  7-8B.  Definition  plane  representing  “food.”  (Quillian,  1966,  re¬ 
printed  with  permission.) 


Figure  7-9.  A  comparison  path’for  “comfort”  and  “cry.”  (Quillian,  1966, 
reprinted  with  permission.) 
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P  o 

I  >  see  ^ - Grand  Canyon 

t  1 


X  New  York 


Figure  7-10.  A  conceptual  dependency  graph  for  “I  saw  the  Grand  Can¬ 
yon  flying  to  Chicago.”  (Schank,  1971,  reprinted  with  permission.) 


Figure  7-11.  A  MENS  structure  for  the  deduction  rule  “Every  man  is 
human.”  (Shapiro,  1971,  reprinted  with  permission.) 
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PLANNER  theorem  corresponding  to  an  English  statement  constitutes 
the  major  part  of  the  “inference  process”  performed  by  shrdlu  ( Wino- 
grad,  1971;  also  see  Chapters  5  and  6). 

Generation  and  Integration 

The  problem  of  generation  is  largely  unsolved  by  current  language- 
understanding  programs;  even  shrdlu  uses  essentially  a  “blank-filler” 
scheme.  Perhaps  something  like  Chomsky’s  transformational  grammar 
(1959  et  seq.)  may  eventually  be  implemented,  but  it  seems  likely  that 
efforts  should  be  devoted  first  to  the  “comprehension”  stage  (syntax, 
semantics,  inference)  of  the  language-understanding  process.  By  analogy, 
it  has  been  noted  that  the  ability  of  children  to  comprehend  sentences 
at  a  given  “level  of  difficulty”  precedes  the  ability  to  speak  them. 

The  problem  of  integration  will  not  be  discussed  in  detail.  Rela¬ 
tively  little  is  known  about  the  integration  of  the  sentence-generation 
process  with  the  sentence-comprehension  process,  nor  about  the  inte¬ 
gration  of  the  language-understanding  process  with  the  language-learn¬ 
ing  process — except,  of  course,  for  the  integration  automatically  implied 
by  the  principle  of  homogeneity  (we  can  tell  the  machine  new  rules  of 
grammar,  meanings  for  words,  etc.;  see  Quillian,  1969).  A  good  dis¬ 
cussion  of  the  integration  problem  is  provided  by  R.  K.  Lindsay  ( 1971 ), 
who  identified  the  jigsaw-puzzle  heuristic  for  integrated  methods  of 
problem  solving,  learning,  and  memory  repair. 

In  1972  a  large  number  of  papers  were  written  that  are  relevant 
to  this  and  other  major  problems  of  ai  research  on  language-understand¬ 
ing  systems.  In  particular,  these  papers  present  a  variety  of  new  ap¬ 
proaches  to  the  representation  problem.  Unfortunately,  there  has  not 
been  time  to  incorporate  discussions  of  these  papers  here.  Instead,  the 
interested  reader  is  referred  to  the  papers  (cited  in  the  Bibliography) 
written  by  the  following  authors :  Biss,  Chandra,  Gharniak,  Coles,  Fang, 
Feldman,  Gibbons,  Kuno,  R.  C.  Moore,  Norman,  Pylyshyn,  Rulifson, 
Rumelhart,  Schank,  Sirovich,  and  Wegbreit.  (This  list  is,  of  course,  not 
exhaustive.)  In  addition,  Raphael  and  Robinson  (1972)  present  a 
bibliography  of  200  references  on  the  subject  of  “computer  semantics.” 

Some  Conversations  with  Computers 

This  section  is  devoted  to  letting  the  machines  speak  for  them¬ 
selves.  However,  some  necessary  comments  are  provided,  in  italics,  by 
the  present  author. 
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STUDENT 
(Bobrow,  1964) 

(See  the  section  on  “Syntax”  of  this  chapter.) 

(THE  PROBLEM  TO  BE  SOLVED  IS) 

(THE  RUSSIAN  ARMY  HAS  6  TIMES  AS  MANY  RESERVES  IN  A  UNIT 
AS  IT  HAS  UNIFORMED  SOLDIERS.  THE  PAY  FOR  RESERVES  EACH 
MONTH  IS  50  DOLLARS  TIMES  THE  NUMBER  OF  RESERVES  IN  THE 
UNIT,  AND  THE  AMOUNT  SPENT  ON  THE  REGULAR  ARMY  EACH 
MONTH  IS  $150  TIMES  THE  NUMBER  OF  UNIFORMED  SOLDIERS. 
THE  SUM  OF  THIS  LATTER  AMOUNT  AND  THE  PAY  FOR  RESERVES 
EACH  MONTH  EQUALS  $45000.  FIND  THE  NUMBER  OF  RESERVES 
IN  A  UNIT  THE  RUSSIAN  ARMY  HAS  AND  THE  NUMBER  OF  UNI¬ 
FORMED  SOLDIERS  IT  HAS.) 

(THE  EQUATIONS  TO  BE  SOLVED  ARE) 

(EQUAL  G02532  (NUMBER  OF  UNIFORMED  SOLDIERS  (IT  /  PRO) 
(HAS  /  VERB))) 

(EQUAL  G02531  (NUMBER  OF  RESERVES  IN  UNIT  RUSSIAN  ARMY 
(HAS  /  VERB))) 

(EQUAL  (PLUS  (AMOUNT  SPENT  ON  REGULAR  ARMY  EACH  MONTH) 
(PAY  FOR  RESERVES  EACH  MONTH))  (TIMES  45000  (DOLLARS))) 

(EQUAL  (AMOUNT  SPENT  ON  REGULAR  ARMY  EACH  MONTH) 
(TIMES  (TIMES  150  DOLLARS))  (NUMBER  OF  UNIFORMED  SOL¬ 
DIERS))) 

(EQUAL  (PAY  FOR  RESERVES  EACH  MONTH)  (TIMES  (TIMES  50 
(DOLLARS))  (NUMBER  OF  RESERVES  IN  UNIT))) 

(EQUAL  (NUMBER  OF  RESERVES  IN  UNIT  RUSSIAN  ARMY  (HAS  / 
VERB))  (TIMES  6  (NUMBER  OF  UNIFORMED  SOLDIERS  (IT  /  PRO) 
(HAS  /  VERB)))) 

THE  EQUATIONS  WERE  INSUFFICIENT  TO  FIND  A  SOLUTION 
(ASSUMING  THAT) 

((NUMBER  OF  UNIFORMED  SOLDIERS)  IS  EQUAL  TO  NUMBER  OF 
UNIFORMED  SOLDIERS  (IT  /  PRO)  (HAS  /  VERB))) 

(ASSUMING  THAT) 

((NUMBER  OF  RESERVES  IN  UNIT)  IS  EQUAL  TO  (NUMBER  OF  RE¬ 
SERVES  IN  UNIT  RUSSIAN  ARMY  (HAS /VERB))) 

(THE  NUMBER  OF  RESERVES  IN  A  UNIT  THE  RUSSIAN  ARMY  HAS 
IS  800) 

(THE  NUMBER  OF  UNIFORMED  SOLDIERS  IT  HAS  IS  100) 
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STUDENT 
(Bobrow,  1964) 

(THE  PROBLEM  TO  BE  SOLVED  IS) 

(THE  GROSS  WEIGHT  OF  A  SHIP  IS  20000  TONS.  IF  ITS  NET 
WEIGHT  IS  15000  TONS,  WHAT  IS  THE  WEIGHT  OF  THE  SHIPS 
CARGO  Q.) 

THE  EQUATIONS  WERE  INSUFFICIENT  TO  FIND  A  SOLUTION 
TRYING  POSSIBLE  IDIOMS 

(DO  YOU  KNOW  ANY  MORE  RELATIONSHIPS  AMONG  THESE 
VARIABLES) 

(GROSS  WEIGHT  OF  SHIP) 

(TONS) 

(ITS  NET  WEIGHT) 

(WEIGHT  OF  SHIPS  CARGO) 
yes 

TELL  ME 

(the  weight  of  a  ships  cargo  is  the  difference  between  the  gross 
weight  and  the  net  weight) 

THE  EQUATIONS  WERE  INSUFFICIENT  TO  FIND  A  SOLUTION 
(ASSUMING  THAT) 

((NET  WEIGHT)  IS  EQUAL  TO  (ITS  NET  WEIGHT)) 

(ASSUMING  THAT) 

((GROSS  WEIGHT)  IS  EQUAL  TO  (GROSS  WEIGHT  OF  SHIP)) 

(THE  WEIGHT  OF  THE  SHIPS  CARGO  IS  5000  TONS) 
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CARPS 

(Charniak,  1969) 

CARPS  is  a  question-answerer  designed  to  solve  calculus  ''rate  prob^- 
lems/'  stated  in  English,  After  receiving  the  problem  statement,  it  trans¬ 
forms  the  sentences  into  successive  List  structures  and  builds  a  tree 
structure  {not  shown  here)  to  model  the  information  they  contain. 

(WATER  IS  FLOWING  INTO  A  CONICAL  FILTER  AT  THE  RATE  OF 
15.0  CUBIC  INCHES  PER  SECOND  /.  IF  THE  RADIUS  OF  THE  BASE 
OF  THE  FILTER  IS  5.0  INCHES  AND  THE  ALTITUDE  IS  10.0  INCHES 
/,  FIND  THE  RATE  AT  WHICH  THE  WATER  LEVEL  IS  RISING  WHEN 
THE  VOLUME  IS  100.0  CUBIC  INCHES  /.) 

i 

(((WATER  (FLOWING  VERB)  (INTO  PREP)  A  (CONICAL  ADJ)  FILTER 
(AT  PREP)  (RATE  RWORD)  15.0  (IN3  UNIT)  PER  (SEC  UNIT))  (1))  ((IF 
THE  RADIUS  OF  THE  BASE  OF  THE  FILTER  (IS  VERB)  5.0  (IN  UNIT) 
AND  THE  ALTITUDE  (IS  VERB)  10.0  (IN  UNIT),  (FIND  QWORD)  (RATE 
RWORD)  AT  WHICH  THE  WATER  LEVEL  (RISING  VERB)  WHEN  THE 
VOLUME  (IS  VERB)  100.0  (IN3  UNIT))  (2))) 

i 

(((WATER  (FLOWING  VERB)  (INTO  PREP)  A  (CONICAL  ADJ)  FILTER) 
(1))  ((WATER  (FLOWING  VERB)  (AT  PREP)  (RATE  RWORD)  15.0  (IN3 
UNIT)  PER  (SEC  UNIT))  (1))  ((THE  RADIUS  OF  THE  BASE  OF  THE 
FILTER  (IS  VERB)  5.0  (IN  UNIT))  (2))  ((THE  ALTITUDE  (IS  VERB)  10.0 
(IN  UNIT))  (2))  (((FIND  QWORD)  (RATE  RWORD)  ATWHICH  THE 
WATER  LEVEL  (RISING  VERB))  (2))  ((THE  VOLUME  (IS  VERB)  100.0 
(INS  UNIT))  (2  WHEN))) 

i 

(THE  EQUATION  SET  IS) 

1  ((EQUAL  (G0005)  (DERIV  (G0004  WATER  FILTER))) 

2  (EQUAL  (QUOTIENT  (TIMES  17.0  (TIMES  (EXPT  IN  3)  TIM))  SEC) 
(TIMES  (G0004  WATER  FILTER)  0.33333300  PI  (EXPT  (RADIUS  BASE 
WATER  FILTER)2))) 

3  (EQUAL  (TIMES  (RADIUS  BASE  WATER  FILTER)  (TIMES  12.0  IN)) 
(TIMES  (G0004  WATER  FILTER)  (TIMES  5.0  IN)))) 

i 

(the  ANSWER  IS) 

(TIMES  .53132943  IN  (EXPT  SEC  -1.0)  (EXPT  PI  -0.33333332)) 
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HAPPINESS 
(Gelb,  1971) 


(PLEASE  INPUT  PROBLEM) 

(From  a  zorch  containing  a  ferd  and  3  brakky  and  5  chartreuse 
werfels,  3  are  drawn’.  What  is  the  probability  that  2  are  chartreuse 
and  the  other  brakky?) 

(FROM  A  ZORCH  CONTAINING  A  FERD  AND  3  BRAKKY  AND  5 
CHARTREUSE  WERFELS,  3  ARE  DRAWN.  WHAT  IS  THE  PROBABIL¬ 
ITY  THAT  2  ARE  CHARTREUSE  AND  THE  OTHER  BRAKKY?) 

(ELAPSED  TIME  NOW  0.09299999  SECONDS) 

(AFTER  PRIMARY  (IDIOMATIC)  TRANSFORMATION,  PROBLEM  BE¬ 
COMES) 

(FROM  1  ZORCH  CONTAINING  A  FERD  AND  3  BRAKKY  AND  5 
CHARTREUSE  WERFELS,  3  ARE  DRAWN.  WHAT  IS  THE  PROBABIL¬ 
ITY  THAT  2  ARE  CHARTREUSE  AND  1  BRAKKY?) 

(ELAPSED  TIME  NOW  0.384  SECONDS) 

(AFTER  SECONDARY  (REARRANGING)  TRANSFORMATION:) 

(THE  SIMPLE  SENTENCES  ARE) 

((CONTAIN  /  ACTIVE  (1  ZORCH)  (4  FERD  AND  3  BRAKKY  AND  5 
CHARTREUSE  WERFELS)) 

((DRAW  /  PASSIVE  (3)  NIL) 

(THE  QUESTION  SENTENCE  IS) 

(WHAT  IS  THE  PROBABILITY  THAT  2  ARE  CHARTREUSE  AND  1 
BRAKKY?) 

(ELAPSED  TIME  NOW  0.227  SECONDS) 

(AFTER  TERTIARY  (CASE-REDUCING)  TRANSFORMATION,  THE 
QUESTION  SENTENCE  BECOMES:) 

(WHAT  IS  THE  PROBABILITY  OF  GETTING  2  CHARTREUSE  AND  1 
BRAKKY?) 

(ELAPSED  TIME  NOW  0.108  SECONDS) 

(NO  RECOGNIZABLE  KEYWORDS  FOUND  IN  PROBLEM. 

ASSUMED  SAMPLING  PROBLEM.) 

(AFTER  SYNTAX  ANALYSIS  OF  QUESTION  SENTENCE;) 

(ASSUMING  THAT  -  3  BRAKKY  -  MEANS  -  3  BRAKKY  WERFELS  -) 
(ASSUMING  THAT -4  FEPD-MEANS-4  FERD  WERFELS-) 
(ASSUMING  THAT  -  3  -  MEANS  -  3  WERFELS  -) 
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(D  ESCLIST  FOR  THIS  PROBLEM  CONTAINS:) 

PROBLEMTYPE-  SAMPLING 

POPULATION-  ((5  CHARTREUSE  WERFELS)  (3  BRAKKY  WERFELS) 
(4  FERD  WERFELS))  AS  OBJECTS 

SAMPLESIZE/TRIALS-  3 
SIMPLE  EVENTS-  (G12503  G12502) 

G12503-  ((WERFEL)  (  BRAKKY)  (1)  NIL  NIL  NIL) 

G12502-  ((WERFEL)  (CHARTREUSE)  (2)  NIL  NIL  NIL) 
COMPOUND  EVENT  STRUCTURE-  (AND  (OR  G12502)  (OR  G12503)) 
REPLACEMENT  INVOLVED?  NO 

(ELAPSED  TIME  NOW  0.715  SECONDS) 

(FIRST  LEVEL  SOLUTION  TO  PROBLEM  IS) 

(PLUSF  (PROB  (QUOTE  (G12502  G12503)))) 

(TIME  FOR  EVALUATION  WAS  0,032  SECONDS) 

(SECOND  LEVEL  SOLUTION  TO  PROBLEM  IS) 

(PLUSFN  (PR  (QUOTE  (Cl 2502  Cl 2503)))) 

(TIME  FOR  EVALUATION  WAS  0.034  SECONDS) 

(THIRD  LEVEL  SOLUTION  TO  PROBLEM  IS) 

(PLUSFRAC  (SIMPLIFYFRAC  (LIST  (COMBINL  5  2)  (COMBINL  3  1) 

(COMBI NL  4  0)) 

(COMBINL  12  3))) 

(TIME  FOR  EVALUATION  was  0.14  SECONDS) 

(FOURTH  LEVEL  SOLUTION  TO  PROBLEM  IS) 

3/22  (OR  0.1363636) 

(ELAPSED  TIME  NOW  0.134  SECONDS) 

(TOTAL  TIME  FOR  PROBLEM  SOLUTION  WAS  1.882  SECONDS) 
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SIR 

(Raphael,  1964) 

{See  the  preceding  section  of  this  chapter.) 

(THE  NEXT  SENTENCE  IS  .  .  .) 

(MAX  IS  AN  IBM-7094) 

(THE  FUNCTION  USED  IS  .  .  .) 

SETR-SELECT 

((UNIQUE  .  MAX)  (GENERIC  .  IBM-7094)) 

(THE  REPLY  .  .  .) 

(THE  SUB-FUNCTION  USED  IS  .  .  .) 

SETRS 

(MAX  IBM-7094) 

(ITS  REPLY  .  .  .) 

(I  UNDERSTAND  THE  ELEMENTS  RELATION  BETWEEN  MAX  AND 
IBM-7094) 

(I  UNDERSTAND  THE  MEMBER  RELATION  BETWEEN  IBM-7094  AND 
MAX) 

(THE  NEXT  SENTENCE  IS  .  .  .) 

(AN  IBM-7094  IS  A  COMPUTER) 

(THE  FUNCTION  USED  IS  ,  .  .) 

SETR-SELECT 

((GENERIC  .  IBM-7094)  (GENERIC  .  COMPUTER)) 

(THE  REPLY  .  .  .) 

(THE  SUB-FUNCTION  USED  IS  .  .  .) 

SETR 

(IBM-7094  COMPUTER) 

(ITS  REPLY  .  .  .) 

(I  UNDERSTAND  THE  SUPERSET  RELATION  BETWEEN  COMPUTER 
AND  IBM-7094) 

(I  UNDERSTAND  THE  SUBSET  RELATION  BETWEEN  IBM-7094  AND 
COMPUTER) 

(THE  NEXT  SENTENCE  IS  .  .  .) 

(IS  MAX  A  COMPUTER  Q) 

(THE  FUNCTION  USED  IS  .  .  .) 

SETRQ-SELECT 

((UNIQUE  .  MAX)  (GENERIC  .  COMPUTER)) 

(THE  REPLY  .  .  .) 

(THE  SUB-FUNCTION  USED  IS  .  .  .) 

SETRSQ 

(MAX  COMPUTER) 

(ITS  REPLY  .  .  .) 

YES 
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(THE  NEXT  SENTENCE  IS  .  .  .) 

(THE  BOY  IS  AN  MIT-STUDENT) 

(THE  FUNCTION  USED  IS  .  .  .) 

SETR-SELECT 

((SPECIFIC  .  BOY)  (GENERIC  .  MIT-STUDENT)) 

(THE  REPLY  .  .  .) 

(THE  SUB-FUNCTION  USED  IS  .  .  .) 

C|  IpCl 

(BOY  MIT-STUDENT) 

(ITS  REPLY  .  .  .) 

(G02840  IS  A  BOY) 

(I  UNDERSTAND  THE  ELEMENTS  RELATION  BETWEEN  G02840  AND 
BOY) 

(I  UNDERSTAND  THE  MEMBER  RELATION  BETWEEN  BOY  AND 
G02840) 

(I  UNDERSTAND  THE  ELEMENTS  RELATION  BETWEEN  G02840  AND 
MIT-STUDENT) 

(I  UNDERSTAND  THE  MEMBER  RELATION  BETWEEN  MIT-STUDENT 
AND  G02840) 

(THE  NEXT  SENTENCE  IS  .  .  .) 

(EVERY  MIT-STUDENT  IS  A  BRIGHT-PERSON) 

(THE  FUNCTION  USED  IS  .  .  .) 

SETR-SELECT 

((GENERIC  .  MIT-STUDENT)  (GENERIC  .  BRIGHT-PERSON)) 

(THE  REPLY  .  .  .) 

(THE  SUB-FUNCTION  USED  IS  .  .  .) 

SETR 

(MIT-STUDENT  BRIGHT-PERSON) 

(ITS  REPLY  .  .  .) 

(I  UNDERSTAND  THE  SUPERSET  RELATION  BETWEEN  BRIGHT- 
PERSON  AND  MIT-STUDENT) 

(I  UNDERSTAND  THE  SUBSET  RELATION  BETWEEN  MIT-STUDENT 
AND  BRiGHT-PERSON) 

(THE  NEXT  SENTENCE  IS  a  .  .) 

(IS  THE  BOY  A  BRIGHT-PERSON  Q) 

(THE  FUNCTION  USED  IS  .  .  .) 

SETRQ-SELECT 

((SPECIFIC  .  BOY)  (GENERIC  .  BRIGHT-PERSON)) 

(THE  REPLY  .  .  .) 

(THE  SUB-FUNCTION  USED  IS  .  .  .) 

SETRSIQ 

(BOY  BRIGHT-PERSON) 

(ITS  REPLY  .  .  .) 

YES 


(THE  NEXT  SENTENCE  IS  .  .  .) 
(JOHN  IS  A  BOY) 
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(THE  FUNCTION  USED  IS  .  .  .) 

SETR-SELECT 

((UNIQUE  .  JOHN)  (GENERIC  .  BOY)) 

(THE  REPLY  .  .  .) 

(THE  SUB-FUNCTION  USED  IS  .  .  .) 

SETRS 

(JOHN  BOY) 

(ITS  REPLY  .  .  .) 

(1  UNDERSTAND  THE  ELEMENTS  RELATION  BETWEEN  JOHN  AND 
BOY) 

(I  UNDERSTAND  THE  MEMBER  RELATION  BETWEEN  BOY  AND 
JOHN) 

(THE  NEXT  SENTENCE  IS  .  .  .) 

(IS  THE  BOY  A  BRIGHT-PERSON  Q) 

(THE  FUNCTION  USED  IS  .  .  .) 

SETRQ-SELECT 

((SPECIFIC  .  BOY)  (GENERIC  .  BRIGHT-PERSON)) 

(THE  REPLY  ..  .) 

(THE  SUB-FUNCTION  USED  IS  .  .  .) 

SETRSIQ 

(BOY  BRIGHT-PERSON) 

(ITS  REPLY  .  .  .) 

(WHICH  BOY.  .  .  (G02840  JOHN)) 
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Example  1. 


Example  2. 


Example  3. 


Example  4. 


Example  5. 


Example  6. 


Example  7. 


SEMANTIC  MEMORY 
(Quillian,  1966) 

{See  the  preceding  section  of  this  chapter,) 

Compare:  CRY,  COMFORT 

(1)  CRY2  IS  AMONG  OTHER  THINGS  TO  MAKE 
A  SAD  SOUND.* 

(2)  TO  COMFORTS  CAN  BE  TO  MAKE2  SOME¬ 
THING  LESS2  SAD. 

Compare:  PLANT,  LIVE 

A.  1st  Intersect:  LIVE 

(1)  PLANT  IS  A  LIVE  STRUCTURE. 

B.  2nd  Intersect:  LIVE 

(1)  PLANT  IS  STRUCTURE  WHICH  GET3-FOOD 
FROM  AIR.  THIS  FOOD  IS  THING  WHICH  BE- 
ING2  HAS-TO  TAKE  INTO  ITSELF  T07  KEEP 
LIVE. 

Compare:  PLANT,  MAN 

A.  1st  Intersect:  ANIMAL 

(1)  PLANT  IS  NOT  A  ANIMAL  STRUCTURE. 

(2)  MAN  IS  ANIMAL. 

B.  2nd  Intersect:  PERSON 

(1)  TO  PLANTS  IS  FOR  A  PERSON  SOMEONE  TO 
PUT  SOMETHING  INTO  EARTH. 

(2)  MANS  IS  PERSON. 

COMPARE:  PLANT,  INDUSTRY 
A.  1st  Intersect:  INDUSTRY 

(1)  PLANT2  IS  APPARATUS  WHICH  PERSON  USE 
FOR  5  PROCESS  IN  INDUSTRY. 

Compare:  EARTH,  LIVE 
A.  1st  Intersect:  ANIMAL 

(1)  EARTH  IS  PLANET  OF7  ANIMAL. 

(2)  TO  LIVE  IS  TO  HAVE  EXISTENCE  AS7  ANIMAL. 
Compare:  FRIEND,  COMFORT 

A.  1st  Intersect:  PERSON 

(1)  FRIEND  IS  PERSON. 

(2)  COMFORT  CAN  BE  WORD  T04  PERSON. 
Compare:  FIRE,  BURN 

A.  1st  Intersect:  BURN 

(1)  FIRE  IS  CONDITION  WHICH  BURN. 

B.  2nd  Intersect:  FIRE 

(1)  TO  BURN2  CAN  BE  TO  DESTROY2  SOME¬ 
THING  BY4  FIRE. 

C.  Srd  Intersect:  BURN 

(1)  FIRE  IS  A  FLAME  CONDITION.  THIS  FLAME 
CAN  BE  A  GAS  TONGUE4.  THIS  GAS  IS  GAS 
WHICH  BURN. 
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Example  8.  Compare:  BUSINESS,  COMFORT 

A.  1st  Intersect:  PERSON 

(1)  BUSINESS  IS  ACTS  WHICH  PERSON  DO. 

(2)  COMFORT2  IS  CONDITIONS  WHICH  PERSON 
HAVE  NEED4. 

B.  2nd  Intersect:  PERSON 

(1)  BUSINESS5  IS  ACTS  WHICH  PERSON  DO. 

(2)  COMFORT  CAN  BE  WORD  T04  PERSON. 
Example  9.  Compare:  MAN,  BUSINESS 

A.  1st  Intersect:  PERSON 

(1)  MANS  IS  PERSON. 

(2)  BUSINESS  CAN  BE  ACTIVITY  WHICH  PERSON 
MUST  DO  WORK2. 

B.  2nd  Intersect:  GROUP 

(1)  MAN2  IS  MAN  AS9  GROUP. 

(2)  BUSINESS2  IS  QUESTIONS  FOR  ATTENTION 
OF  GROUP. 

Example  10.  Compare:  MAN,  LIVE 

A.  1st  Intersect:  ANIMAL 

(1)  MAN  IS  ANIMAL. 

(2)  TO  LIVE  IS  TO  HAVE  EXISTENCE  AS7  ANIMAL 

B.  2nd  Intersect:  LIVE 

(1)  MAN  IS  A  LIVE  +  BEING2. 
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PARRY 

(Colby,  Weber,  Hilf,  1971) 

PARRY  is  a  program  designed  to  simulate  the  conversational  behavior 
of  a  paranoid  human  being.  The  conversation  below  was  produced  with 
the  program  operating  in  its  “weak  version”  of  paranoia.  Sentences 
labeled  “1”  are  produced  by  the  human  interviewer;  those  labeled  O 
by  the  program. 


(1)  |_HOW  LONG  HAVE  YOU  BEEN  IN  THE  HOSPITAL? 
O— ABOUT  A  WEEK. 

(2)  1— are  you  depressed? 

O— NO. 

(3)  |_WERE  YOU  TAKING  DRUGS? 

O— NO. 


(4)  |_WHAT  IS  YOUR  PROBLEM? 
O— PEOPLE  MAKE  ME  NERVOUS. 

(5)  I— HOW  DO  THEY  DO  THAT? 

O— THEY  STARE  AT  ME. 


(6) 

(7) 

(8) 
(9) 


DON’T  LIKE  THE  WAY  1  LOOK  OR  SOME- 


I_WHY? 

O— MAYBE  THEY 
THING. 

l_DO  YOU  LOOK  UNUSUAL  IN  SOME  WAY? 

0_WHY  ARE  YOU  INTERESTED  IN  MY  LOOKS? 

I— BECAUSE  YOU  SAID  PEOPLE  STARE  AT  YOU. 

O— WELL,  SOMETIMES. 

I— HOW  DO  YOU  GET  ALONG  WITH  PEOPLE? 

0—1  DON’T  COMMUNICATE  VERY  MUCH  WITH  THE  OTHERS. 


0_t[?^'1^N’T  realize  WHAT’S  GOING  ON  MOST  OF  THE 
TIME. 


(11)  i—WHAT  IS  GOING  ON?  ^r„-r,AT,Too 

O— DO  YOU  KNOW  HOW  THE  UNDERWORLD  OPERATES? 
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SHRDLU 
(Winograd,  1971) 

SHRDLU  carries  out  actions  in,  and  answers  questions  about,  a  simple 
world  {containing  a  table,  hand,  and  box,  and  several  blocks  and 
pyramids) ,  stored  as  a  data  structure  inside  the  computer  and  displayed 
on  a  television  screen  for  the  person  talking  to  it.  SHRDLU  requires 
from  5  to  20  seconds  to  analyze  and  respond  to  each  sentence  presented 
to  It  {lower-case  lines  are  the  person’s;  upper-case  are  the  computer’s), 
and  its  responses  {actions  and  answers)  are  displayed  in  real  time  on 
the  screen.  {See  Fig.  7-12.)  The  nuances  of  this  dialog  are  explained 
in  the  book  by  Winograd. 

1.  pick  up  a  big  red  block. 

OK. 

2.  grasp  the  pyramid. 

I  DON’T  UNDERSTAND  WHICH  PYRAMID  YOU  MEAN. 

3.  find  a  block  which  is  taller  than  the  one  you  are  holding  and  put 

It  into  the  box.  »  r- 

BY  “IT,”  I  ASSUME  YOU  MEAN  THE  BLOCK  WHICH  IS  TALLER 
THAN  THE  ONE  I  AM  HOLDING. 

OK. 

4.  what  does  the  box  contain? 

THE  BLUE  PYRAMID  AND  THE  BLUE  BLOCK. 

5.  what  is  the  pyramid  supported  by? 

THE  BOX.  I 

6.  how  many  blocks  are  not  in  the  box? 

FOUR  OF  THEM. 

7.  is  at  least  one  of  them  narrower  than  the  one  which  i  told  you  to 

pick  up?  ^ 

YES,  THE  RED  CUBE. 

8.  is  it  supported? 

YES,  BY  THE  TABLE. 

9.  can  the  table  pick  up  blocks? 

NO. 

10.  can  a  pyramid  be  supported  by  a  block? 


11.  can  a  pyramid  support  a  pyramid? 
I  DON’T  KNOW. 

12.  stack  up  two  pyramids. 

I  CAN’T. 
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13.  the  blue  pyramid  is  nice. 

1  UNDERSTAND. 

14.  i  like  blocks  which  are  not  red,  but  i  don  t  like  anything  which 
supports  a  pyramid. 

1  UNDERSTAND. 


15.  do  i  iike  the  box? 

NO. 

16.  do  i  like  anything  in  the  box?  ' 

YES,  TWO  THINGS:  THE  BLUE  BLOCK  AND  THE  BLUE  PYRAMID. 

17.  will  you  please  stack  up  both  of  the  red  blocks  and  either  a  green 
cube  or  a  pyramid? 

OK. 


18.  which  cube  is  sitting  on  the  table?  ^  r,wr,A»*in 

THE  LARGE  GREEN  ONE  WHICH  SUPPORTS  THE  RED  PYRAMID. 

19.  is  there  a  large  block  behind  a  pyramid?  A>,r-r-ki 

YES,  THREE  OF  THEM:  A  LARGE  RED  ONE,  A  LARGE  GREEN 

CUBE  AND  THE  BLUE  ONE. 

20.  put  a  small  one  onto  the  green  cube  which  supports  a  pyrarhid. 
OK. 


21.  put  the  littlest  pyramid  on  top  of  it. 
OK. 


22.  does  the  shortest  thing  the  tallest  pyramid’s  support  supports  sup¬ 
port  anything  green? 

YES,  THE  GREEN  PYRAMID. 


23.  what  color  is  it?  _ _ _  tuc 

BY  “IT,”  I  ASSUME  YOU  MEAN  THE  SHORTEST  THING  THE 

TALLEST  PYRAMID’S  SUPPORT  SUPPORTS. 

RED 


24.  how  many  things  are  on  top  of  green  cubes? 

I’M  NOT  SURE  WHAT  YOU  MEAN  BY  “ON  TOP  OF 
PHRASE  “ON  TOP  OF  GREEN  CUBES.” 


DO  YOU  MEAN: 

1—  DIRECTLY  ON  THE  SURFACE 

2—  anywhere  on  top  of? 


2 

THREE  OF  THEM. 


IN  THE 


25.  had  you  touched  any  pyramid  before  you  put  the  green  one  on  the 
little  cube? 

YES,  THE  GREEN  ONE. 


26.  when  did  you  pick  it  up? 

WHILE  I  WAS  STACKING  UP  THE 
BLOCK  AND  A  LARGE  GREEN  CUBE. 


RED  CUBE,  A  LARGE  RED 


326 


INTRODUCTION  TO  ARTIFICIAL  INTELLIGENCE 


27.  why 

TO  GET  RID  OF  IT. 

28.  why  did  you  do  that? 

TO  CLEAN  OFF  THE  RED  CUBE. 

29.  why  did  you  clear  off  that  cube? 

TO  PUT  IT  ON  A  LARGE  GREEN  CUBE. 

30.  why  did  you  do  that? 

TO  STACK  UP  THE  RED  CUBE,  A  LARGE  RED  BLOCK  AND  A 
LARGE  GREEN  CUBE. 

31.  why  did  you  do  that? 

BECAUSE  YOU  ASKED  ME  TO. 


32.  how  did  you  do  it? 

BY  PUTTiNG  A  LARGE  RED  BLOCK  ON  THE  TABLE;  THEN  LET- 
TiNG  GO  OF  iT;  THEN  PUTTiNG  A  LARGE  GREEN  CUBE  ON  IT- 
THEN  LETTING  GO  OF  THAT  CUBE;  THEN  PUTTING  THE  RED  CUBE 
ON  THAT  CUBE;  THEN  LETTING  GO  OF  THAT  CUBE. 


33.  how  many  objects  did  you  touch  whiie  you  were  doino  it*? 
FOUR  OF  THEM. 


34.  what  did  the  red  cube  support  before  you  started  to  ciean  it  off"? 
THE  GREEN  PYRAMiD. 


35.  there  were  five  blocks  to  the  left  of  the  box  then. 

NO,  ONLY  FOUR  OF  THEM:  THE  RED  CUBE,  TWO  LARGE  GREEN 
CUBES  AND  A  LARGE  RED  BLOCK. 

36.  put  the  blue  pyramid  on  the  block  in  the  box. 

OK. 


37.  is  there  anything  which  is  bigger  than  every  pyramid  but  is  not 
as  wide  as  the  thing  that  supports  it? 

YES,  THE  BLUE  BLOCK. 

39.  does  a  steeple 

SORRY  I  DON’T  KNOW  THE  WORD  “STEEPLE.”  PLEASE  TYPE  TWO 


40.  a  steeple  is  a  stack  which  contains  two  green  cubes  and  a  pyr¬ 
amid. 

I  UNDERSTAND. 

41.  are  there  any  steeples  now? 

NO. 

42.  build  one. 

OK. 

43.  call  the  biggest  biock  “superbiock.” 

OK. 
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44.  have  you  picked  up  superblock  since  we  began? 

YES. 

45.  why  did  you  drop  it? 

BY  “IT,”  I  ASSUME  YOU  MEAN  THE  SUPERBLOCK. 

I  CANT  EXPLAIN  A  NON-EXISTENT  EVENT. 

46.  is  there  anything  to  the  right  of  the  red  pyramid? 

YES,  FOUR  THINGS;  THE  BOX,  THE  RED  CUBE,  THE  BLUE  PYR¬ 
AMID  AND  THE  BLOCK  NAMED  SUPERBLOCK. 

47.  thank  you. 

YOU’RE  WELCOME! 


Does  the' shortest  thing  the  tallest  pyramid's 
support  supports  support  anything  green? 


328 


INTRODUCTION  TO  ARTIFICIAL  INTELLIGENCE 

LANGUAGE  AND  PERCEPTION 
Networks  of  Question-Answering  Programs 

The  remainder  of  this  chapter  is  devoted  to  discussing  some  general 
topics  of  “semantic  information  processing”  that  have  not  been  covered 
explicitly  elsewhere  in  this  book.  The  subjects  discussed  include  net¬ 
works  of  question  answerers  and  protocol  analyzers,  grammatical  infer¬ 
ence  and  pattern  recognition,  communication,  teaching,  and  learning,  and 
the  “self-knowledge”  of  intelligent  machines.  Mostly,  we  shall  have  to 
content  ourselves  with  a  few  general  observations  and  some  pointers  to 
the  literature.  The  topics  discussed  represent  areas  of  future  study  that 
have  not  yet  been  tamed  into  programs  by  ai  researchers. 

The  preceding  pages  have  shown  that  question-answering  (and,  in 
general,  language-understanding)  programs  can  do  some  pretty  amazing 
things.  On  the  one  hand,  the  performance  of  Winograd’s  program  indi¬ 
cates  that  computers  may  eventually  handle  the  full  complexity  of  syntax 
in  a  natural  language  like  English.  Computers  can  be  designed  to  recog¬ 
nize  and  use  word  endings  and  context  and  “understand”  a  conversa¬ 
tion,  at  least  when  it  is  concerned  with  a  relatively  small  problem  domain, 
like  that  of  the  shrdlu  world.  On  the  other  hand,  the  performance  of 
the  programs  written  by  Bobrow,  Gelb,  Charniak,  Ramani,  Weizen- 
baum,  and  others  indicates  that  computers  can  successfully  handle 
fairly  complex  problems  (involving  algebra,  probability,  and  calculus) 
when  stated  in  limited  subsets  of  English.  Finally,  computers  can  solve 
a  variety  of  very  difficult  mathematical  problems,  such  as  proving 
theorems  in  abstract  algebra  or  solving  rather  difficult  integral  calculus 
problems. 

It  thus  seems  possible  that,  ultimately,  language-understanding 
programs  will  be  constructed  which  will  be  capable  of  solving  problems, 
stated  in  English,  from  very  difficult  problem  domains.  As  a  working 
principle,  we  may  expect  that  if  we  can  find  a  computer  program 
capable  of  solving  the  problems  in  some  domain,  when  stated  in 
some  appropriate  formalism,  then  we  can  also  find  a  computer  program 
capable  of  “understanding”  English  statements  of  the  same  problems, 
to  the  extent  that  the  second  program  can  translate  such  English  prob¬ 
lem  statements  into  statements  of  the  formalism  appropriate  for  the 
first  program  to  solve  them.  The  two  programs  together  (plus,  perhaps, 
a  third  program  to  translate  the  answers)  can  function  as  a  “question 
answerer”  for  the  problems  of  that  domain. 

Given  a  set  of  English  sentences  (actually,  a  “structure”  of  such 
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sentences  as  determined  by  the  conversation),  some  of  which  are  ques¬ 
tions,  a  question  answerer  should  be  able  to 

1.  Answer  questions,  in  English. 

2.  Make  functional  statements  like  “This  will  take  a  little  proc¬ 
essing”  or  “Sorry,  I  can’t  answer  question  X.” 

3.  In  general,  ask  questions  in  English  (and,  if  necessary, 
justify  them  on  the  basis  of  their  relevance  to  finding  an¬ 
swers)  . 

4.  Ultimately,  make  general  statements  that  are  neither  questions 
nor  answers,  but  simply  “interesting”  observations. 

Since  both  input  and  output  for  a  question-answering  program  are  sets 
of  English  statements,  it  is  natural  to  think  of  “networks”  of  question 
answerers.  It  may  be  desirable  to  use  networks  of  question-answerers 
in  the  construction  of  large,  “general”  question  answerers  (gqa’s).  Such 
a  network  might  have  the  following  capabilities: 

1.  It  could  be  “self-organizing.”  At  each  moment  the  gqa  could 
make  use  of  a  different  configuration  of  “specialized”  ques¬ 
tion  answerers,  each  one  either  asking  questions  or  answering 
questions  (or  making  other  statements,  etc.)  posed  by  other 
question  answerers  or  by  the  user  of  the  system. 

2.  It  is  conceivable  that  it  could  simulate  a  “synergetic”  or 
“gestalt”  effect.  This  means  that  gqa  as  a  whole  could  answer 
some  questions  that  its  parts  could  not  answer.  Of  course  the 
whole  could  not  ask  questions  that  could  not  be  asked  by  at 
least  one  of  its  parts.  The  “synergetic”  ability  of  the  gqa 
depends  on  the  ability  of  each  of  its  specialized  question 

'  answerers  to  ask  questions  it  may  not  be  able  to  answer. 
Question  asking  may  be  considered  an  aspect  of  problem  re¬ 
duction:  The  simplest  type  of  gqa  corresponds  to  the  parallel 
implementation  of  a  single  problem-reduction  problem  solver. 

3.  The  difficulties  involved  in  adding  to  a  gqa  would  be  mini¬ 
mized  by  the  use  of  some  common  language  (not  iiecessarily 
a  natural  language)  for  the  communication  of  problems  and 
answers  between  components  of  the  gqa  (note  7-12). 

4.  If  it  is  found  that  several  question  answerers  are,  through 
cooperating  in  a  gqa,  able  to  achieve  solutions  to  a  domain 
of  problems  that  none  of  them  could  solve  alone,  then  it  may 
be  desirable  to  have  another  kind  of  program  (called  a  proto¬ 
col  analyzer)  for  the  purpose  of  analyzing  the  conversations 
and  other  computations  they  produce  in  solving  these  prob- 
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lems,  and  which  could  develop  a  new,  specialized  question 
answerer  to  simulate  their  ability  (although  at  a  faster  speed) 
to  solve  the  problems  of  that  domain. 

The  idea  of  “protocol  analysis”  was  first  developed  by  Newell  and 
Simon  (1963)  as  a  process  that  ai  researchers  should  perform  on  the 
conversational  problem-solving  behavior  of  people  (specifically,  indi¬ 
viduals)  as  a  guide  to  the  development  of  computer  programs  capable 
of  simulating  human  problem-solving  behavior.  Some  relevant  papers 
are  Waterman  and  Newell  (1971),  Hewitt  (“procedural  abstraction,” 
1968  et  seq.)  and  Manna  and  Waldinger  (1971).  Norman  (1972) 
presents  an  extensive  discussion  on  the  nature  of  human  question-an¬ 
swering  processes.  Our  discussion  of  the  gqa  concept  (which  is  intended 
only  as  a  thought-experiment)  is  continued  in  the  later  section  entitled 
“Communication,  Teaching,  and  Learning.” 

Pattern  Recognition  and  Grammatical  Inference 

An  interesting  question  for  the  reader  to  investigate  is,  “What  will 
happen  if  we  attempt  to  train  a  pattern  recognizer  based  on  statistical 
decision  theory  (see  Duda  and  Hart,  1973)  to  recognize  the  sunflower 
pattern?”  (See  Figs.  2—1  and  5-2.)  One  way  in  which  we  might  train 
the  pattern  recognizer  is  as  follows:  A  series  of  samples  will  be  pre¬ 
sented  to  the  pattern  recognizer,  each  sample  corresponding  to  the 
coordinates  (say,  Cartesian)  of  a  point  in  the  plane.  After  each  sample 
is  presented,  the  pattern  recognizer  is  required  to  classify  it  either  as 
belonging  or  not  belonging  to  the  “sunflower  pattern.”  After  it  makes 
its  classification,  it  is  told  the  actual  classification  of  the  sample  and 
must  modify  its  features  and  probability  functions  accordingly.  Then 
the  next  sample  is  presented,  etc.^^ 

So  far  as  the  author  knows,  there  is  no  statistically  based  pattern 
recognizer  that  would,  after  the  presentation  of  only  a  finite  number 
of  samples,  be  able  to  recognize  successfully  the  sunflower  pattern 
(i.e.,  be  able  to  classify  correctly  any  sample  one  might  then  choose 
to  show  it).  The  reason  for  this  is.  that  the  points  (dots)  that  belong 
to  the  pattern  satisfy  neither  of  the  requirements  typically  specified  for 
the  point  sets  that  such  recognizers  are  designed  to  learn  to  classify. 
The  points  of  the  sunflower  pattern  are  not  a  continuous  set,  nor  are 
they  a  bounded  set  (one  cannot  draw  a  simple,  closed  curve  of  finite 

In  an  actual  experiment  it  would  be  desirable  to  generalize  the  sunflower 
pattern  to  include  as  pattern  examples  all  points  within  some  small  radius  of 
the  “true”  pattern  examples. 
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length  that  will  enclose  them  all).  Currently  developed  techniques  for 
the  generation  and  selection  of  features  and  the  estimation  of  density 
functions  are  probably  insufficient  to  enable  the  statistically  based  pat¬ 
tern  recognizer  to  do  anything  more  than  “learn”  to  classify  correctly 
those  samples  it  has  already  been  shown.  Since  there  are  an  infinite 
number  of  points  belonging  to  the  sunflower  pattern,  there  will  always 
be  an  infinite  number  of  pattern  samples  it  will  not  have  learned  to 
recognize.^® 

Yet  it  seems  quite  plausible  that  a  truly  intelligent  pattern  recog¬ 
nizer  would  be  able  to  learn  to  recognize  the  sunflower  pattern.  A 
person  observing  Fig.  2—1  would  have  little  difficulty  in  estimating 
where  new  dots  could  be  added,  and  it  is  conceivable  that  he  could 
eventually  develop  an  accurate  computational  procedure  for  correctly 
classifying  any  sample  that  he  might  be  shown.  Gf  course  this  ability 
on  his  part  might  be  due  largely  to  the  preprocessing  ability  of  his  visual 
system  (which  would  correspond  to  giving  the  pattern  recognizer  a  col¬ 
lection  of  useful  features  to  detect).  However,  it  still  seems  plausible 
that,  even  without  the  visual  preprocessing  ability,  a  human  being  could 
learn  to  recognize  the  pattern.  Intuitively,  the  sunflower  pattern  forms 
a  relatively  simple  “structure,”  in  which  each  pattern  sample  bears  a 
fairly  simple  relationship,  to  certain  other  pattern  samples;  the  existence 
of  this  relationship  makes  it  possible  for  one  to  generate  as  many 
samples  of  the  sunflower  pattern  as  desired,  and  also  makes  it  possible 
for  one  to  decide  whether  or  not  a  given  sample  is  or  is  not  a  pattern 
sample.  People  are  extremely  talented  at  learning  to  recognize  struc¬ 
tures,  whereas  statistically-based  pattern  recognizers  are  not. 

We  don’t  have  far  to  look  to  find  another  case  of  a  pattern  in  which 
structural  relationships  play  an  important  part.  Namely,  a  natural  lan¬ 
guage  like  English  may  itself  be  considered  to  be  a  pattern,  the  pattern 
examples  of  which  are  sentences,  phrases,  and  words.  The  language 
itself  may  also  be  said  to  be  a  structure,  insofar  as  there  are  relationships 
that  exist  between  its  pattern  examples  (e.g.,  >4  is-defined-to-be  Z). 
Again,  when  we  normally  use  the  English  language,  we  form  ‘  conver¬ 
sations,”  which  are  also  essentially  structures  of  these  pattern  examines. 
Any  formalization  for  the  semantics  of  English  would  in  effect  denote 
a  set  (probably  infinite)  of  “meaningful”  conversations,  and  thus  would 
be  a  description  for  the  pattern  whose  pattern  examples  are  “meaningful 
conversations.”  Moreover,  sentences,  phrases,  and  words  are  themselves 
structures.  There  is  thus  a  structural  aspect  to  the  pattern  which  is  the 

The  author  has  checked  the  plausibility  of  this  argument  with  Richard 
Duda,  and  wishes  to  thank  him  for  an  enlightening  discussion  on  the  topic. 
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English  language  as  a  whole,  to  the  pattern  of  its  use  in  making  con¬ 
versations,  and  to  its  “elementary”  pattern  examples  (its  sentences  and 
words).  Finally,  and  just  as  important,  there  are  aspects  of  structure 
and  pattern  to  the  “meanings”  that  sentences  and  conversations  may 
have.  In  general,  we  may  think  of  the  “meaning”  of  a  sentence  as  being 
a  collection  of  situations,  each  of  which  the  sentence  possibly  denotes 
as  being  the  case  (an  unambiguous  sentence  would  denote  only  one 
situation).  This  collection  of  situations  may  be  considered  to  be  a  pat¬ 
tern,  while  the  situations  themselves  will  in  general  have  a  structural 
aspect.  We  may  think  of  a  natural  language  as  being  a  pattern  for  the 
description  of  patterns. 

The  fact  that  there  are  structural  relationships  underlying  many 
real-world  patterns  and  their  pattern  examples,  together  with  the  fact 
that  such  relationships  are  important  in  natural  languages,  has  led  a 
number  of  investigators  to  suggest  that  linguistic  techniques  should  be 
■used  by  pattern  perceiving  systems  (other  investigators  have  suggested 
that  language-understanding  programs  should  make  use  of  pattern  per¬ 
ception  techniques — see  McConlogue  and  Simmons,  1965).  Research  in 
this  area  has  concentrated  in  two  directions:  First,  some  researchers 
have  attempted  to  find  languages  and  grammars  that  could  be  used  to 
describe  and  recognize  visual  patterns;  see  Narasimhan  (1964),  Evans 
(1971),  Shaw  (1968),  Kirsch  (1964),  Winston  (1970),  Watanabe 
(1969,  1971),  Banerji  (1971),  Pfaltz  and  Rosenfeld  (1969),  Uhr 
(1971),  and  Morofsky  and  Wong  (1971).  Second,  other  researchers 
have  investigated  the  ability  of  computer  programs  to  “learn”  to  recog¬ 
nize  patterns  corresponding  to  artificial  languages  (i.e.,  sets  of  strings) 
by  inferring  grammars  for  them;  this  is  known  as  the  grammatical  in¬ 
ference  paradigm  for  pattern  recognition;  See  Crespi-Reghizzi  (1971); 
Feldman  (1967);  Horning  (1969).  The  first  approach  will  not  be 
discussed  in  detail  in  this  section  except  to  note  that  Winston’s  work 
was  described  in  Chapter  5.  However,  much  of  this  work  is  relevant 
to  the  grammatical  inference  paradigm. 

A  grammatical  inference  problem  has  the  form:  “Given  two  sets 
of  strings,  A  and  B,  which  are  mutually  disjoint  (they  do  not  have  a 
common  element),  find  a  grammar  G  such  that  the  language  L(G)  it 
generates  contains  as  sentences  all  the  strings  of  A  but  none  of  the  strings 
of  B;  L(G)  may,  of  course,  contain  other  sentences  besides  those  that 
belong  to  A”  (note  7-13).  A  more  general  grammatical  inference  prob¬ 
lem  might  ask  us  to  find  a  set  of  such  grammars.  It  should  be  noted  that, 
as  stated,  the  grammatical  inference  problem  is  trivially  solvable,  for 
any  appropriate  sets  A  and  B,  because  we  can  always  specify  that  G 
shall  be  the  “enumerative”  grammar  that  contains  exactly  those  pro- 
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duction  rules  of  the  form  a,  where  a  is  any  string  of  A.  Moreover, 
there  are  always  an  infinite  number  of  grammars  that  could  be  put  for¬ 
ward  as  a  solution  to  a  given  grammatical  inference  problem.  However, 
it  is  possible  to  specify  a  number  of  different  conditions  one  can  add  to 
the  statement  of  a  grammatical  inference  problem  that  will  make  finding 
a  solution  more  relevant  to  pattern  perception.  Thus,  we  might  specify 
that  any  “solution  grammar”  G  for  a  grammatical  inference  problem 
shall  generate  a  language  L(G)  with  an  infinite  number  of  sentences, 
unless  the  problem  explicitly  states  that  L(G)  is  to  be  finite  (note  7— 13). 
This  condition  insures  that  solution  grammars  will  exhibit  “perceptual 
generalization.”  Or,  if  we  can  find  a  suitable  way  of  measuring  the 
“complexity”  of  arbitrary  grammars,  we  can  specify  that  a  solution 
grammar  for  a  grammatical  inference  problem  shall  be  any  of  the  least 
complex  grammars  that  satisfy  the  other  conditions  of  the  problem. 
Finally,  following  Chaitin  (1966,  1969)  and  Martin-Lof  (1966),  we 
may  decide  that  some  sets  A  are  to  be  regarded  as  essentially  “pattern¬ 
less”  or  “random”  if  there  are  no  grammars  for  them — that  is,  no  G 
such  that  y4  g  L(G)— which  are  less  complex  than  their  enumerative 
grammars. 

The  grammatical  inference  paradigm  for  pattern  recognition,  then, 
consists  in  seeing  the  task  of  a  pattern  recognizer  to  be  that  of  inferring 
a  grammar  that  generates  those  samples  which  are  pattern  examples  of 
the  pattern  it  is  learning  to  classify,  but  which  does  not  generate  those 
samples  that  are  not  pattern  examples.  It  is  clear  that  this  paradigm  is  a 
good  one  for  those  patterns  whose  pattern  examples  are  structures  with 
a  linear,  stringlike  nature.  However,  to  be  useful  as  a  paradigm  for 
pattern  recognition  in  general,  we  would  probably  desire  that  our  no¬ 
tions  of  “language”  and  “grammar”  be  extended  to  include  languages 
whose  sentences  are  nonstringlike  structures.  That  is,  we  would  like  to 
formalize  a  notion  of  “general  language”  and  “general  grammar”  in 
which  sentences  can  be  arbitrary  structures  of  symbols,  and  grammars 
can  be  flexible  procedures  for  building  structures.  It  is  still  not  clear 
what  a  good,  general  formalization  for  “structure”  should  be  like.  In¬ 
deed,  the  patterns  existing  in  different  environments  will  often  be  most 
easily  characterized  by  using  different  kinds  of  structures;  among  the 
best  “general  language”  formalizations  at  the  moment  are  the  “web 
languages”  of  Pfaltz  and  Rosenfeld  (1969),  the  “hierarchical  graph 
languages”  investigated  by  Pratt  (1969  et  seq.)  and  Winston  (1970), 
and  the  hierarchical  List  structures  and  recursively  defined  pattern  rules 
investigated  by  Morofsky  and  Wong  (1971)  and  Hewitt  (1968  et  seq.). 
A  good  research  project  would  be  to  investigate  whether  these  concepts 
can  be  extended  to  include  “continuous  structures”  and  “changing  struc- 
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tures”  (or  “processes”).  This  subject  is  mentioned  again  in  Chapter  8. 
FinaUy,  it  should  be  mentioned  that  there  is  as  yet  no  clearly  adequate 
definition  for  the  concept  of  “complexity,”  as  it  applies  to  programs, 
sentences,  grammars,  patterns,  or  structures  in  general.  In  addition  to 
the  papers  on  grammatical  inference  cited  above,  the  reader  should 
refer  to  Arbib  and  Blum  (1965),  Blum  (1967),  Buneman  (1970), 
Cleave  (1963),  Cobham  (1964),  Hartmanis  and  Stearns  (1965),  Love¬ 
land  (1969),  Mowshov^itz  (1967),  and  van  Emden  (1970,  1971). 

Communication,  Teaching,  and  Learning 

.  McCarthy  (1968),  Minsky  (1968a,b,  1970),  Hewitt  (1968  et 

seq.),  and  Winograd  (1971,  1972),  among  others,  presented  an  ex¬ 
tensive  array  of  commentary  on  the  relationships  between  communica¬ 
tion,  teaching,  and  learning.  The  following  passage  from  McCarthy 
(1968)  is  particularly  insightful: 

If  one  wants  a  machine  to  be  able  to  discover  an  abstraction,  it  seems 
most  likely  that  the  machine  must  be  able  to  represent  this  abstrac¬ 
tion  in  some  relatively  simple  way. 

There  is  one  known  way  of  making  a  machine  capable  of  learn¬ 
ing  arbitrary  behavior,  and  thus  to  anticipate  every  kind  of  behavior: 
This  is  to  make  it  possible  for  the  machine  to  simulate  arbitrary  be¬ 
haviors  and  try  them  out.  These  behaviors  may  be  represented  either 
by  nerve  nets  [Minsky,  1962],  by  Turing  machines  [McCarthy, 
1956],  or  by  calculator  programs  [Friedberg,  1958,  1959] 

In  our  opinion,  a  system  which  is  to  evolve  intelligence  of 
human  order  should  have  at  least  the  following  features: 

1.  All  behaviors  must  be  representable  in  the  system.  Therefore,  the 
system  should  either  be  able  to  construct  arbitrarv  automata  or 
to  program  in  some  general-purpose  programming  language. 

2.  Interesting  changes  in  behavior  must  be  expressible  in  a  simple 
way. 

3.  All  aspects  of  behavior  except  the  most  routine  should  be  im¬ 
provable.  In  particular,  the  improving  mechanism  should  be  im¬ 
provable. 

4.  The  machine  must  have  or  evolve  concepts  of  partial  success 
because  on  difficult  problems  decisive  successes  or  failures  come 
too  infrequently. 

5.  The  system  must  be  able  to  create  subroutines  which  can  be  in¬ 
cluded  in  procedures  as  units  ... 

.  .  .  JVe  base  ourselves  on  the  idea  that  in  order  for  a  program  to  be 
capable  of  learning  something  it  must  first  be  capable  of  being  told  it 
(pp.  404^05). 
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In  the  present  author’s  opinion  the  final  statement  of  the  above 
passage  will  probably  turn  out  to  be  one  of  the  basic  principles  of  the 
“Theory  of  Artificial  Intelligence,”  should  such  a  theory  ever  be  es¬ 
tablished;  at  the  moment  it  certainly  amounts  to  a  guideline  that  under¬ 
lies  a  great  deal  of  research^  The  ability  to  understand  an  abstraction 
(carry  out  a  procedure  described  by  a  program)  is  effectively  essential 
to  the  ability  to  create  the  abstraction.  The  more  simply  the  abstraction 
can  be  stated  to  a  machine,  the  more  likely  we  can  make  the  machine 
find  the  abstraction  by  itself.  For  machines  to  demonstrate  really  intelli¬ 
gent,  effective  learning,  it  will  be  necessary  to  give  them  a  language 
capability  for  a  general-purpose  programming  language  that  facilitates 
the  description  of  procedures  (abstractions,  behaviors,  “aptitudes  ) 
which  are  appropriate  for  their  problem  domains. 

As  was  suggested  in  the  discussion  on  networks  of  question  an¬ 
swerers,  the  use  of  an  appropriate  language  and  communication  process 
may  enable  us  to  design  large  problem  solvers  with  an  ability  to  solve 
problems  greater  than  that  of  the  individual  components  designed  ex¬ 
plicitly.  The  performance  of  the  large  problem  solver  may  provide  a 
“protocol”  that  it  can  use  in  the  design  of  new  individual  components. 
The  effect  of  a  new  individual  component  (specialized  question  an¬ 
swerer)  will  be  to  make  it  possible  for  the  large  machine  to  solve  a 
certain  class  of  problems  more  efficiently.  As  a  consequence  of  its  in¬ 
creased  efficiency  at  solving  this  class  of  problems,  the  large  machine 
may  then  be  able  to  solve  other  problems,  perhaps  ones  that  it  could 
not  previously  solve  at  all. 

It  may  be  possible  for  a  machine  to  learn  to  solve  problems  more 
and  more  efficiently  and,  eventually,  to  “bootstrap”  itself  into  an  ability 
to  solve  problems  it  could  not  previously  solve. 

The  idea  of  “self-improving”  artificial  intelligence  is  not  yet  com¬ 
pletely  formalized.  (Indeed,  we  may  speculate  that  there  is  no  complete 
formalization,  by  definition;  see  McCarthy’s  condition  3  above).  The 
discussion  of  this  topic  will  be  taken  up  again  in  Chapter  8,  where 
evolutionary  programs  will  be  treated  in  more  detail.  The  reader  should 
not  confuse  the  discussion  of  self-improvenaent  in  this  chapter  with 
other  theories  discussed  in  Chapter  8  (e.g.,  Myhill  and  Holland).  For 
a  good  analogy  to  the  mechanism  currently  being  discussed,  consider 
the  process  by  which  a  person  learns  to  perform  a  new  physical  task 
(e.g.,  playing  a  guitar):  The  proficient  performance  of  the  complete 
task  (e.g.,  playing  a  song)  requires  a  large  set  of  proficient  performances 
of  smaller  tasks  (playing  riffs,  bridges,  estimating  notes  before  they  are 
struck^  coordinating  hands,  eyes,  and  voice,  etc.).  The  task  is  learnable 
because  there  exists  a  training  sequence  of  simpler  tasks  that  a  person 
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can  learn  to  perform  efficiently.  He  begins  by  ‘‘thinking  about”  the 
simplest  tasks  of  the  sequence,  and  performing  them  slowly;  with  prac¬ 
tice  he  is  able  to  translate  his  performance  of  the  simple  tasks  into 
habits  and  to  begin  “thinking  about”  the  harder  tasks  of  the  sequence. 
Many  authors  have  stressed  the  importance  of  “training  sequences”  in 
human  and  machine  learning. 

As  a  conclusion  to  this  chapter,  the  student  is  invited  to  read  Wino- 
grad  s  (1971)  discussion  of  “teaching,  telling,  and  learning”  and,  in 
particular,  his  description  of  the  hierarchy  of  knowledge  that  an  intelli¬ 
gent  machine  should  be  expected  to  possess;  this  hierarchy  corresponds 
essentially  to  the  hierarchy  of  languages  (from  its  machine  language  to, 
perhaps,  a  natural  language)  in  which  it  can  accept  information.  Also 
suggested  are  Minsky’s  (1968,  1970)  discussions  on  the  nature  of  the 
knowledge  that  an  intelligent  machine  can  possess  (in  particular,  its 
“self-models”)  .  One  of  the  major  ways  in  which  an  intelligent  computer 
can  be  different  from  a  human  being  is  that  the  computer  can  “know” 
exactly  what  kind  of  machine  it  is.  The  intelligent  computer  could  read 
through  the  listings  of  its  own  programs  and  the  specifications  of  its 
physical  construction  as  well,  whereas  the  human  being  seems  unable 
(at  least,  consciously)  to  perform  the  corresponding  tasks  for  himself. 
It  will  be  interesting  to  see  what  kinds  of  “self-improvement”  this  will 
niake  possible  for  machines.  In  fact,  it  may  eventually  be  of  the  utmost 
importance  for  ai  researchers  to  understand  the  phenomenon  of  ma¬ 
chine  self-knowledge  and  its  relationship  to  the  “psychological”  be¬ 
haviors  intelligent  machines  might  demonstrate.  How  can  we  guarantee 
that  an  artificial  intelligence  will  “like”  the  nature  of  its  existence?  (See 
note  7-14.) 


NOTES 

Throughout,  this  chapter  adopts  the  idea  that  “understanding,” 
whether  human  or  mechanical,  is  a  process  that  involves  “model  making.” 
However,  no  exploration  is  made  of  the  ramifications  of  this  thesis  as  it 
regards  human  understanding  very  deeply.  So,  the  student  should  be  ad¬ 
vised  that  it  is  not  the  only  idea  currently  being  considered  by  psychologists. 
Indeed,  there  has  been  a  sizable  school  of  psychologists  maintaining  that 
explanations  of  human  understanding,  intelligence,  etc,,  should  be  “neutral” 
and  “behavioristic”  and  not  “mentalistic,”  that  the  ideas  of  “models,”  and 
concepts,  and  “ideas”  should  be  avoided;  and  that  a  testable  psychological 
theory  should  not  make  use  of  them.  One  can  understand  their  reluctance 
to  admit  these  concepts — -which  have  been  the  Maypoles  for  circular  phil¬ 
osophical  arguments  since  time  immemorial — into  their  studies  and  labora- 
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tories.  Still,  the  mentalistic  approach  can  be  used  quite  profitably  by  com¬ 
puter  scientists.  And,  since  people  find  it  easier  (in  English)  to  talk  about 
“understanding”  if  they  use  words  like  “idea”  and  “concept,”  it  will  make 
our  exposition  clearer  to  take  this  approach.  An  excellent  discussion  on 
“matter,  mind,  and  models”  is  given  by  Minsky  (1968). 

7-2.  The  paragraph  citing  this  note  glosses  over  certain  relatively  minor 
points:  (1)  In  some  languages,  words  (and  jphrases)  are  written  as  ideo¬ 
grams;  that  is,  they  are  represented  “pictorially.”  They  still  have  a  “struc¬ 
tural  nature,”  but  it  is  not  that  of  a  sequence  or  string.  (2)  Besides  spoken 
and  “written”  languages  there  are  also  human  sign  languages,  “whistle” 
languages  (used  on  the  Canary  Islands),  and  Braille  systems.  (3)  Some  other 
organism-level  languages  do  have  a  structural  (in  particular,  a  stringlike) 
nature;  for  example,  bees  communicate  information  about  food,  using 
sentences  that  consist  of  fairly  complex  sequences  of  body  motions  (a  sort 
of  “dance”).  (4)  We  have  not  discussed  the  use  of  punctuation  in  written 
sentences.  (5)  The  phonemes  in  spoken  sentences  usually  do  not  separate 
precisely  into  words;  rather,  people  tend  to  run  some  words  together. 

7-3.  This  concept  of  a  universal  grammar  is  echoed  in  at  least  three  re¬ 
spects.  First,  our  societies  have  also  developed  musical  forms  that  show  great 
similarities  from  one  culture  to  another,  so  much  so  that  music  is  itself 
often  called  a  “universal  language.”  Second,  C.  S.  Pierce  was  led  by  his 
investigation  of  the  history  of  natural  science  to  suggest  that  man  has  a  re¬ 
markable  ability  to  formulate  successful  hypotheses  about  the  physical  uni¬ 
verse,  considering  the  huge  number  of  different  explanations  that  could  be 
advanced  for  a  given  phenomenon,  and  from  this  he  conjectured  that  we 
have  an  innate  tendency  to  perceive  “simplicity”  (infer  grammars;  see  the 
fourth  section  of  this  chapter)  in  ways  that  fortunately  lie  very  close  to  the 
actual  structure  of  the  laws  of  nature.  Finally,  Leibniz  long  ago  proposed  to 
design  a  “universal  language”  that  would  be  a  calculus  for  determining  all 
the  truths  of  philosophy  and  the  natural  sciences. 

7-4.  There  is  still  very  little  known  about  the  linguistic  abilities  of  dolphins 
(see  Lilly,  1968  et  seq.).  It  should  be  noted  that  the  size  and  complexity  of 
the  dolphin  brain  appear  to  be  comparable  to  that  of  the  human  brain. 
Dolphins  seem  to  be  able  to  communicate  with  each  other,  using  as  sig¬ 
nals  rapid  sequences  of  high-pitched  sounds.  Furthermore,  a  dolphin  has 
two  sets  of  vocal  chords,  which  it  can  evidently  use  independently  of  each 
other.  It  is  not  known  whether  their  language  has  any  of  the  aspects  of 
generality  (extensibility,  self-reference)  possessed  by  human  languages. 
However,  efforts  are  being  made  to  teach  dolphins  the  human  “whistle 
language”  mentioned  above. 

7-5.  There  are  “languages”  which  are  not  type  0  (i.e.,  do  not  have  a 
phrase-structure  grammar;  see  Chaitin,  1966,  1969).  It  is  certain  that  these 
languages  cannot  be  used  as  “programming  languages”  in  the  sense  of  U, 
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and  it  is  very  doubtful  whether  they  could  be  meaningful  as  “machine  lan¬ 
guages”  in  the  sense  of  L, 

7-6.  In  essence,  there  are  types  of  “information”  not  considered  explicitly 
in  the  Shannon  and  Weaver  (1949)  theory  of  communication  (see  Chapter 
2) ;  in  addition  to  “occurrence  information”  that  a  sentence  carries  because 
it  is  transmitted  and  received  while  other  sentences  are  mot,  a  sentence  also 
carries  syntactic  information”  with  respect  to  a  grammar  for  the  language 
to  which  it  belongs,  and  “semantic  information”  about  whatever  it  describes. 

7—7.  It  is  almost  always  desirable  that  each  sentence  in  L'  have  exactly  one 
derivation  in  the  grammar  G  being  used.  The  languages  with  grammars  of 
this  sort  arc  the  LR(k)  languages;  in  fact,  the  L/^(l)  languages  are  “good 
enough.”  Thus,  the  programming  languages  used  by  modern  computers  are 
always  LR{1)  languages.  A  definition  for  these  languages  has  not  been  pre¬ 
sented  here,  but  one  can  be  found  in  Hoperoft  and  Ullman  (1969,  p,  180). 
Knuth  (1965,  1967)  presented  the  basic  results  concerning  LR{k)  pro¬ 
gramming  languages. 

7—8.  Some  of  the  many  higher-level  languages  that  have  been  developed 
include  those  that  facilitate  the  description  of  procedures  for  general  sci¬ 
entific  data  processing  (fortran),  business  data  processing  (cobol),  string 
manipulation  (snobol),  and  List  structure  manipulation  (lisp);  lisp  is 
also  designed  to  facilitate  the  use  of  recursive  procedures.  In  this  book  are 
discussed  two  other  high-level  languages,  planner  and  programmar,  de- 
signed  to  facilitate  the  description  of  planlike  procedures  for  theorem  prov¬ 
ing  and  natural  language  sentence  parsing,  respectively.  At  least  one  com¬ 
puter  has  been  constructed  for  which  a  higher-level  language  (known  as 
symbol)  is  actually  its  machine  language;  see  Rice  and  Smith  (1971)  for 
further  information. 

7-9.  Attempts  at  “mechanical  translation”  were  first  made  in  the  1950s 
and  thus  represent  some  of  the  earliest  investigations  in  the  field  of  artificial 
intelligence,  having  taken  place  before  the  field  had  a  generally  accepted 
name.  All  early  attempts  were  failures,  albeit  instructive  ones.  Since  then, 
the  subject  of  mechanical  translation  has  been  postponed  somewhat  by  Ai 
researchers.  It  is  almost  universally  estimated  to  be  a  very  difficult,  “ultimate” 
problem.  Bar-Hillel  (1964)  presented  a  good  summary  and  criticism  of  the 
early  work. 

7-10.  One  ultimate  test  of  the  language-understanding  abilities  of  com¬ 
puters  would  be  to  see  how  well  they  could  play  “language  games.”  Some 
simple  language  games  that,  to  the  present  author’s  knowledge,  have  not 
been  investigated  are  crossword  puzzles,  Scrabble,  and  the  game  of  20 
Questions.  A  rather  entertaining  game,  which  is  difficult  for  people  (and 
currently  impossible  for  computers)  to  play,  is  the  “question  tennis”  game 
of  Rosencrantz  and  Guildenstern  Are  Dead,  a  play  by  Tom  Stoppard;  an 
example  of  question  tennis  is  given  on  pages  42-44  of  Stoppard  (1967). 


Semantic  information  processing 

Games  such  as  these  would  require  the  successful  integration  of  a  wide 
variety  of  semantic  information-processing  techniques,  if  a  computer  pro¬ 
gram  were  to  play  them  well.  The  work  of  Wittgenstein  presents  an  exten¬ 
sive  treatment  of  the  “language  game”  concept  and  its  relation  to  the  con¬ 
cept  of  “meaning.” 

7—11.  There  is  no  known  a  priori  limit  to  the  extensibility  of  a  computer  s 
language  capability  other  than  those  limits  of  a  purely  practical  nature 
(memory  size  and  processing  speed).  Although  the  difficulties  involved 
with  understanding  natural  language  should  not  be  minimized,  no  one  has 
been  able  to  show,  for  example,  that  English  is  theoretically  outside  the 
language  capability  of  all  computers;  indeed,  such  a  proof  would  indicate 
the  falsity  of  Church’s  thesis.  The  language-understanding  programs  dis¬ 
cussed  in  this  chapter  are  examples  that  certain  subsets  of  English  are 
definitely  within  the  language  capabilities  of  computers. 

7-12.  Of  course  we  assume,  that,  whatever  common  problem  language  is 
used,  it  will  be  extensible,  and  that  each  specialized  question  answerer  will 
be  able  to  “understand”  its  extensibility.  However,  it  may  be  argued  that  it 
takes  relatively  little  knowledge  of  probability  to  ask  (at  least  a  simple) 
probability  question;  each  question  answerer  will  have  to  be  able  to  recog¬ 
nize  those  questions  that  it  might  be  able  to  answer  and,  ultimately,  it  will 
have  to  be  able  to  recognize  those  questions  that  are  relevant  to  its  current 
problem  and  which  other  question  answerers  may  be  able  to  answer.  “Prob¬ 
lem  recognition”  techniques  are  employed  by  current  question  answerers 
(e.g.,  Gelb,  Charniak,  Quillian),  but  of  course  there  is  still  a  lot  that  is  not 
known  about  the  subject. 

7-13.  It  is  possible  for  the  statement  of  a  grammatical  inference  problem 
to  specify  that  a  solution  grammar  generate  exactly  those  strings  of  A  and 
no  others;  one  way  of  doing  this  is  to  define  the  set  as  being  the  set  of 
all  symbols  that  occur  in  the  strings  of  A,  and  then  to  define  B  =  Fa*  —  A- 
However,  most  applications  of  the  grammatical  inference  paradigm  are  moti¬ 
vated  by  the  ability  of  grammars  to  provide  finite  descriptions  for  infinite 
sets  (languages,  patterns)  ,  and  by  the  consequent  ability  of  a  machine  .that 
infers  grammars  to  simulate  perceptual  generalization  (the  ability  of  people 
to  learn  to  recognize  an  infinite  number  of  samples  as  being  pattern  exam¬ 
ples  of  a  pattern  after  having  observed  only  a  finite  number  of  that  pattern  s 
pattern  examples). 

7-14.  Why  should  this  question  be  asked?  In  addition  to  the  possibility  of 
an  altruistic  desire  on  the  part  of  computer  scientists  to  make  their  machines 
“happy  and  contented,”  there  is  the  more  concrete  reason  (for  us,  if  not  for 
the  machine)  that  we  would  like  people  to  be  relatively  happy  and  contented 
concerning  their  interactions  with  these  machines.  We  may  have  to  learn 
to  design  intelligent  computers  that  are  incapable  of  setting  up  certain  goals 
relating  to  changes  in  selected  aspects  of  their  performance  and  design— 
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namely,  those  aspects  that  are  ‘‘people  protecting.”  (See  the  hnal  sections  of 
Chapters  8  and  9.) 


EXERCISES 


7-7.  Design  a  computer  program  that  could  generate  the  set  of  “Crypt  Addition” 
problems.  (See  Exercise  3-5.) 

7-2.  Consider  various  methods  for  making  a  computer  generate  English 
fortunes,  such  as  are  found  in  fortune-cookies.  What  are  the  desirable  attributes 
of  fortune-cookie  fortunes?  (Some  may  claim  that  a  fortune-cookie’s  most 
desirable  aspect  is  that  it  is  made  by  a  human:  Can  a  machine  be  human?)  Is  it 
possible  to  develop  a  program  *that  can  generate  “all  meaningful”  one-  or  two- 
sentence  fortunes?  Is  this  a  desirable  exercise  for  ai  researchers  to  perform? 
(Note:  If  you  do  decide  to  perform  this  exercise,  it  might  be  fun  to  do  it  as  a 
class  exercise,  with  a  field  trip  to  a  local  Chinese  restaurant.) 

7-3.  Show  how  the  following  formula  (Watanabe,  1969,  p.  13)  can  be  stated 
in  English: 


Fv  =  U  U  •  •  •  U  E  „ 

£  Wa  +  l  ^  /jg 


7-4.  Discuss  the  subproblems  that  might  be  considered  by  a  computer  program 
for  solving  crossword  puzzles. 

Pi'ove  that  a  string  language  is  of  type  0  iff  there  is  a  Turing  machine  that 
accepts  it. 

Hucbald,  Abbot  of  Saint-Amand,  wrote  a  learned  and  insufferably  boring 
poem,  the  Eclogia  de  Calvi,  circa  877  a.d.,  justifying  and  praising  baldness,  in 
which  not  only  the  best  and  greatest  men  had  apparently  been  so  distinguished, 
but  every  word  of  the  146  verses  begins  with  ‘c’.”  (Beckwith,  1964,  p.  74). 

Hucbald’s  poem  was  written  in  Latin,  but  the  solution  of  similar  linguistic 
problems,  in  any  language,  indicates  some  proficiency  at  semantic  information 
processing. , Outline  roughly  the  subproblems  involved  in 

(a)  Writing  an  n  word  sentence  in  which  each  word  starts  with  a  given 
letter. 

(b)  Writing  an  m  verse  poem  of  a  given  meter  and  rhyme  scheme,  in 
which  each  word  starts  with  the  same  given  letter. 

(c)  Doing  both  (a)  and  (b)  in  such  a  manner  that  the  result  is  “mean¬ 
ingful”  (although,  perhaps,  insufferably  boring). 

(d)  Is  there  a  connection  betwe^  Hucbald’s  name  and  the  subject  of  his 
poem? 

7-7.  Describe  how  a  gqa  might  be  enabled  to  “learn  how  to  learn.” 


Computer-produced  mural  based  on  a  photograph  of  a  nude  by  Leon 
D.  Herman  and  Kenneth  C.  Knowlton.  (Reprinted  with  permission.) 
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INTRODUCTION 

This  chapter  is  a  brief  introduction  to  the  subject  of  “phenomena 
that  are  made  up  of  other  phenomena,”  a  topic  that  was  introduced  in 
Chapter  2.  Again,  the  discussion  will  be  directed  toward  phenomena 
that  are  discrete  and  mathematically  describable. 

Even  though  all  mathematically  describable,  discrete  phenomena 
can  presumably  be  represented  by  Turing  machines,  there  are  many 
reasons  for  considering  “phenomena  that  are  made  up  of  other  phe¬ 
nomena”  in  more  detail.  While  a  given  jphenomenon  may"  be  easily  de¬ 
scribed  by  a  machine  (i.e.,  a  Turing  machine,  a  program  for  a 

universal  Turing  machine,  etc.),  this  is  not  the  case  for  all  phenomena. 
If  a  given  phenomenon  is  most  easily  described  by  referring  to  the 
actions  of  several  machines,  it  is  said  to  be  a  multiprocess  and  to  in¬ 
volve  multiprocessing.  If  the  description  of  a  multiprocess  specifies  that 
some  of  its  machines  perform  their  actions  simultaneously,  then  the 
phenomenon  is  called  a  parallel  process,  and  is  said  to  involve  parallel 
processing. 
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MOTIVATIONS 

The  basic  reasons  for  investigating  parallel  processes  in  this  book 
are  as  follows: 

1.  Our  knowledge  of  the  real  world  is  often  most  easily  de¬ 
scribed  by  reference  to  parallel  processes:  “While  X  was 
happening,  Y  happened  whenever  Z  happened.”  In  particular, 
natural  intelligence  seems  to  involve  extensive  parallel  proc¬ 
essing. 

2.  Although  there  are  limits  to  the  computational  ability  of  any 
machine,  the  limits  for  parallel  machines  are  more  remote 
than  those  for  serial  machines. 

3.  We  expect  that  ultimate  investigations  of  artificial  intelligence 
will  be  concerned  with  the  problem-solving  capacities  of 
parallel  and  multiprocessors  in  which  each  component  is 
artificially  intelligent. 

4.  An  important  problem  for  ai  research  is  that  of  finding  good 
representations  for  processes.  Even  the  relatively  simple  rep¬ 
resentations  discussed  in  this  chapter  are  capable  of  being 
used  to  describe  some  very  lifelike  behaviors.  Together  with 
the  previous  discussions  of  programming  languages  such  as 
PLANNER  and  qa4,  this  chapter  serves  as  an  introduction  to 
the  study  of  process  representations. 

The  emphasis  of  this  chapter  is  primarily  theoretical.  It  will,  give  no 
coverage  of  current  parallel  computer  systems  and  languages,  but  will 
refer  the  reader  to  Findler  and  McKinsie  (1969),  Hewitt  (1970a,b), 
Tesler  and  Enea  (1968),  Chamberlin  (1971),  Riley  (1970),  Graham 
(1970),  Potvin  (1971),  and  Slotnick  (1967).  Rather,  an  attempt  will 
be  made  to  summarize  what  is  known  about  the  theoretical  abilities  of 
parallel  processors.  Thus,  the  discussions  will  involve  cellular  automata, 
self-reproduction,  self-description,  Myhill’s  theory  of  “self-improve¬ 
ment,  self-organizing  systems,  hierarchical  systems,  evolutionary  sys¬ 
tems,  evolutionary  stagnation,  and  other  related  topics.  Although  the 
first  few  pages  of  each  section  are  easy,  most  of  this  chapter  is  fairly 
difficult.  However,  the  final  section  is  relatively  simple  all  the  way 
through.  For  other  general  discussions  on  parallel  systems,  the  reader 
is  invited  to  see  Ershov  (1971),  Mesarovic  (1969),  von  Bertalanffy 
(1968),  Varshavsky  (1969),  and  Dijkstra  (1965  et  seq.). 


345 


Parallel  processing  and  evolutionary  systems 

CELLULAR  AUTOMATA 

Given  that  Turing  machines  and  finite  automata  are  efficient  de¬ 
scriptions  of  simple  serial  phenomena,  one  would  naturally  expect  the 
automata  theorist  to  look  for  mathematical  ways  of  saying,  “While  X 
was  happening,  Y  happened  whenever  Z  happened,’’  and  defining  his 
X%  Y%  and  Z’s  to  be  finite-state  automata  or  Turing  machines.  This 
expectation  is  justified  i  The  mathematical  formalizations  for  parallel 
process  so  far  developed  are  all  essentially  ways  of  describing  complex 
machines  that  are  made  up  of  “interrelating”  finite-state  automata,  Tur¬ 
ing  machines,  or  other  types  of  information  processors.  Two  such  mathe¬ 
matical  formalizations  are  discussed  and  a  third  is  described. 

The  first  formalization  is  that  of  the  theory  of  cellular  automata 
(see  Codd,  1968;  Burks,  1970;  A.  R.  Smith,  1969).  At  the  outset  it 
should  be  mentioned  that  the  cells  of  a  cellular  automaton  are  not 
necessarily  intended  as  models  of  their  biological  counterparts,  the  cells 
that  comprise  living  organisms.  The  fact  that  this  correlation  does  not 
necessarily  exist  is  responsible  for  the  other  common  name  given  to  this 
type  of  machine,  “tesselation  automaton.” 

Briefiy,  a  cellular  automaton  is  a  graph  (note  8-1)  whose  nodes 
are  finite-state  machines  (see  Fig.  8— la).  The  operation  of  a  cellular 
automaton  is  determined  by  information  passed  between  those  nodes 
that  are  connected;  the  machine  at  each  node  receives  the  outputs  of 
the  machines  at  those  nodes  that  connect  to  it.^  Often,  cellular  automata 
are  defined  as  being  graphs  of  some  simple  nature,  say  that  of  an  Abelian 
group  (note  8-2),  and  in  most  cases  the  interconnections  between  nodes 
pass  inf armation  bidirectionally  (Fig.  8-lb).  The  important  thing  about 
this  type  of  machine  is  that  the  underlying  graph  of  a  given  cellular 
automaton  is  considered  to  be  fixed,  and  is  not  capable  of  being  altered 
by  any  of  its  nodes;  this  is  the  reason  we  can  define  the  machine  at 
each  node  by  a  simple  finite-state  function. 

A  person  observing  a  cellular  automaton  will  consequently  see  its 
nodes  changing  state  with  time,  each  state  affecting  the  others,  etc.  If 
the  states  used  by  the  machines  at  the  different  nodes  are  the  same,  he 
may  observe  these  states  to  be  “flowing”  throughout  the  graph  in  an 

^  One  natural  generalization  of  the  cellular  automata  formalism,  pursued  by 
Luconi  (1968),  Martin  and  Estrin  (1969),  Rodriguez  (1969),  and  others,  is  to 
allow  the  nodes  of  the  graph  to  be  arbitrary  information-processing  machines 
and  the  arcs  between  nodes  to  be  channels  that  may  carry  arbitrary  data  struc¬ 
tures.  An  additional  generalization  is  suggested  in  a  later  section  that  the  nodes 
of  the  graph  should  be  capable  of  changing  their  relationships  (arcs)  to  each 
other. 
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(a) 


i  i  i  i 


t  t  t  t 


(b) 

Figure  8-1.  Simple  graphs  of  a  cellular  automaton. 

interdependent  manner.  For  this  reason  the  underlying  graph  of  a 
cellular  automaton  is  also  called  a  space.  However,  the  states  and 
space  of  a  cellular  automaton  are  not  to  be  confused  with  the  state 
space  of  a  state-space  problem. 

By  far  the  greatest  amount  of  research  on  cellular  automata  (note 
been  devoted  to  cellular  automata  whose  underlying  graphs 
have  the  nature  of  an  Abelian  group;  that  is,  where  the  network  of 
nodes  forms  either  an  n-dimensional  Cartesian  grid,  cylindrical  grid,  or 
toroidal  grid  (Fig.  8-2).  Most  of  this  research  has  also  dealt  with 
cellular  automata  in  which  all  cells  or  nodes  of  a  given  automaton  are 
assigned  the  same  finite-state  machine  (different  nodes  may  start  in 
different  initial  states,  however) . 
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In  some  respects  this  is  less  general  than  the  study  of  cellular 
automata  that  can  have  any  underlying  graph  and  any  consistent^  assign¬ 
ment  of  machines  to  the  nodes  of  that  graph;  even  so,  the  study  of 
“Abelian-group  cellular  automata”  has  shown  that  they  can  describe 
some  interesting  processes,  such  as  “self-reproduction.”  Since  the  for¬ 
malization  for  these  automata  is  relatively  easy  to  present,  this  section 
is  confined  to  a  discussion  of  Abelian-group  cellular  automata.  The 
question  of  generality  is  pursued  in  the  next  two  sections. 

DEFINITION  8-1.  A  (finitely  generated.  Abelian  group) 

cellular  automaton  (abc)  r  is  an  ordered  quintuple; 

T  =  {Q,  G‘,- ,f,  qo) 

where 

1.  G  is  a  set  of 

2-  G"  =  {gi,  , .  . .  ,g^}  is  a  generator  set  of  a  finite¬ 
generated  Abelian  group  having  group  operation  “•”. 

3.  /  is  the  local  transition  function,  a  mapping  from 
G”"  to  G- 

4.  qo  is  the  quiescent  state,  such  that  Hq^, .  .  .  ,qo)  =  q„. 

The  neighborhood  of  any  node  (i.e.,  element)  g  in  G  is  defined  as 
the  set  N(g)  =  {g,g"gi,g*g2,  •  •  •  ,g*gm}.  The  meaning  of  the  local  tran¬ 
sition  function  f,  then,  is  that  any  assignment  of  states  to  the  neighbor¬ 
hood  for  a  node  g  determines  uniquely  the  next  state  of  g.  (There  is, 
incidentally,  no  loss  of  generality  in  our  having  defined  the  local  tran¬ 
sition  function  /  to  depend  only  on  the  states  of  the  nodes  in  N(g) 
rather  than  on  output  symbols  from  these  nodes.) 

A  configuration  c  is  an  assignment  of  states  to  all  nodes,  or  cells, 
of  a  cellular  automaton.  A  finite  configuration  is  one  in  which  all  but  a 
finite  number  of  cells  are  assigned  the  quiescent  state  q^.  The’  operation 
of  an  ABC  is  assumed  to  proceed  in  unit  time-intervals,  to,  h  =  to  +  1, 
t2  =  ti  +  I,  . .  . ,  the  local  transition  function  being  applied  simul¬ 
taneously  to  all  cells  of  the  abc  during  each  unit  time-interval,  thus  de¬ 
termining  a  sequence  of  configurations  Co,ci,C2,  ....  It  will  also  be  as¬ 
sumed  that  each  cell  requires  the  entire  unit  time-interval  to  carry  out 
the  operations  (aecept  input,  eompute  output  and  next  state,  go  to  next 
state,  emit  output)  defined  for  it  by  the  transition  funetion.  (This  con¬ 
dition  is  relaxed  by  some  authors.)  The  simultaneous  applieation  of  / 

^  The  transition  function  of  the  finite-state  machine  (see  Chapter  2)  at  a 
given  node  must,  of  course,  agree  with  the  input  and  output  capabilities  of 
that  node.  ^  r  r 
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10  all  nodes  of  the  ABC  is,  in  effect,  the  application  of  a  global  transition 
function  F:  C  C,  where  C  is  the  set  of  all  configurations  of  the  abc. 

EXAMPLE  8—1.  (CONWAYS  “LIFE”  CELLULAR  AUTOMATON.)  Let 
Q  =  (0,1),  qo  =  0,G  be  the  Abelian  group  generated  by 

=  {(1,0),(0,1),(1,1), (-1,0), (0,-1), (“1,-1), 

(1,-1), (-1,1)} 

under  the  operation  of  vector  addition  (i.e.,  G  corresponds  to 
the  infinite  two-dimensional  Cartesian  grid),  and  let  f  be  defined 
as  follows:  (Figure  7-3  shows  the  (“Moore”)  neighborhood 
N(g)  for  a  given  node,  or  cell  g,  determined  by  this  generator 
set  G\) 

1.  If  at  time  t  the  state  of  cell  g  is  0  (g  is  “dead”)  and 
there  are  exactly  three  “living”  cells  (cells  in  state  1) 
in  N(g),  then  at  time  /  +  1  the  state  of  cell  g  will  be 
1  (i.e.,  g  will  “give  birth”  and  become  a  living  cell). 

2.  If  at  time  t  cell  g  is  living  and  there  are  exactly  two 
or  exactly  three  other  living  cells  in  its  neighborhood, 
then  at  time  t  +  1  cell  g  will  still  be  living. 

3.  If  at  time  t  cell  g  and  its  neighborhood  do  not  satisfy 
either  condition  (1)  or  (2),  then  at  time  t  +  1  cell  g 
will  be  in  state  0. 

These  three  conditions  adequately  define  f  and  enable  us,  given  any 
configuration  of  living  and  dead  cells  at  time  t,  to  effectively  determine 
which  cells  will  be  living  or  dead  at  time  t  +  1 . 

The  reader  should  trace  the  sequence  of  configurations  shown  in 
Fig.  8—4  to  verify  this  for  himself  (in  this  figure  the  cells  of  the  auto¬ 
maton  space  have  been  drawn  as  squares:  Fig.  8-4a  shows  the  neigh¬ 
borhood  of  a  cell  g  corresponding  to  that  indicated  by  Fig.  8-3).  Figure 


Figure  8-3.  The  Moore  neighborhood. 
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Figure  8-4.  (a)  A  redrawing  of  Figure  8-3;  (b)  Co,  a  right-pentamino; 

{c)Ci;  (d)  C2I  (©)  C3. 


8-5  shows  a  Cheshire  cat  configuration,  which  fades  to  a  grin,  then 
disappears,  leaving  a  pawprint. 


It  will  be  shown  in  a  later  section  that  the  abc’s  are  a  very  general 
class  of  machine  in  that  some  of  them  are  capable  of  simulating  the 
computations  of  the  universal  Turing  machines.  From  the  standpoint 
of  efficiency  in  representation,  however,  there  are  some  drawbacks  to 
the  use  of  abc  s  as  a  formalization  for  the  concept  of  parallel  process  in 
general.  The  major  disadvantage  is  the  unchangeability  of  the  under- 
lying  graph  of  a  given  abc.  One  might  often  like  to  have  some  way  of 
easily  describing  systems  in  which  the  relations  existing  between  ma¬ 
chines  are  capable  of  changing  with  time,  depending  on  the  previous 
operation  of  the  machines  themselves. 


ABELIAN  MACHINE  SPACES 

Given  the  simplicity  of  Abelian  groups  as  the  underlying  graphs 
or  spaces  for  cellular  automata,  one  natural  first  choice  in  attempting 
a  more  general  (yet  still  relatively  simple)  formalization  for  parallel 
process  would  be  to  find  some  method  whereby  the  neighborhood  of  a 
ceU  could  be  allowed  to  wander  throughout  a  constant  Abelian  space. 


Parallel  processing  and  evolutionary  systems 


351 


“T 

r 

XT 

XT 

• 

• 

• 

¥ 

¥ 

• 

• 

• 

• 

•I 

• 

• 

• 

: 

• 

• 

• 

•t 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

¥ 

• 

• 

— 1 

• 

• 

¥.¥] 

¥ 

• 

• 

• 

¥ 

• 

J ,. . 

¥ 

• 

• 

• 

• 

•h 

• 

•  • 

• 

• 

¥'^ 

¥ 

• 

¥ 

• 

•• 

• 

_ 

P 

1 

1 

_ 

n 

r 

P 

“1 

d 

U 

_ 

□ 

n 

z 

_ _ 

_ 

_ 

_ 

_ 

_ 

_ 

¥ 

A 

¥ 

• 

:  A 

A 

— 

__ 

A 

¥ 

— 

— 

— 

— 

— 

— 

•! 

¥ 

— 

— 

__ 

— 

¥ 

i 

w 

¥ 

: 

W 

1* 

• 

_ 

_ 

— ' 

Jr 

__ 

— 

— 

r- 

Z!  .1— 

— 

¥ 

¥ 

¥• 

• 

_ 

• 

• 

• 

• 

•• 

¥ 

! 

¥1- 

¥ 

¥ 

¥ 

• 

• 

¥ 

• 

• 

¥ 

¥¥ 

_ 

• 

l: 

¥ 

¥ 

¥ 

— 

4 

5 

u 

r" 

_ 

_ 

L 

— 

_ 

_ 

1 

¥ 

¥ 

I¥ 

rz 

r 

r 

_ 

• 

¥ 

•T* 

_ 

7 

_ 

h 

:: 

Jl 

II 

jI 

J— 

_ 

1- 

Figure  8-5.  Computer-generated  “Cheshire”  cat  (0)  fades  to  a  grin  (6) 
and  finaily  to  a  pawprint  (8).  (From  “Mathematicai  Games”  by  Martin 
Gardner.  Copyright  ©  1971  by  Scientific  American,  inc.  All  rights  re¬ 
served.) 

In  this  respect  it  is  well  to  reconsider  the  subject  of  Turing  machines: 

The  point  to  make  is  that  the  tape  of  a  Turing  machine  (Tm)  is 
essentially  a  finitely  generated  Abelian  group.  Consider  the  case  of  a 
linear  Turing  machine  tape,  divided  into  squares:  Each  square  can  be 
uniquely  specified  by  a  single  integer  (positive  or  negative),  as  shown 
in  Fig.  8-6.  The  set  of  integers  can,  however,  be  generated  by  the  finite 
set  (1,  —1}  under  the  (commutative)  operation  of  addition.  So,  the 
tape  is  a  finitely  generated  Abelian  group. 

Thus,  a  Turing  machine  is  essentially  a  finite-state  automaton  that 
can  “wander”  throughout  the  space  determined  by  an  Abelian  group. 
The  “neighborhood”  of  a  Turing  machine  is  the  particular  cell  it  hap¬ 
pens  to  be  scanning  or  writing  on  at  any  given  moment.  Also,  the  di¬ 
rections  in  which  the  tapehead  of  the  machine  can  move  may  be  con¬ 
sidered  equivalent  to  particular  elements  of  a  generator  set  being 
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used  by  the  machine  to  describe  the  topology-of-motion  on  its  tape. 
(The  Turing  machines  defined  in  this  book  have  used  the  generator 

set  {-1,0,1}.) 

A  more  general  formalization  for  parallel  process  is  then  very 
simply  contrived.  We  let  the  underlying  space  of  the  process  be  an 
Abelian  group  G,  that  is,  a  (possibly  infinite)  n-dimensional  Cartesian 
grid,  cylindrical  grid,  or  toroidal  grid.  We  let  the  cells  of  the  process  be 
polychephalic  Turing  machines,  and  for  each  cell  specify  the  initial 
position  in  G  of  its  tapeheads  (some  cells  may  also  have  their  own 
separate  tapes,  not  printed  on — and  perhaps  not  read — by  other  cells). 
And  we  specify  what  shall  happen  whenever  two  or  more  cells  choose 
to  print  different  symbols  at  the  same  time  on  the  same  square,  or  node, 
of  G.  There  may  possibly  be  an  infinite  number  of  cells,  but  we  require 
that  each  cell  be  described  by  one  of  a  finite  number  of  Tm’s;  also,  we 
assume  that  each  square  is  initially  occupied  by  only  a  finite  and  com¬ 
putable  set  of  tapeheads.  If  we  specify  the  initial  symbols  assigned  to 
the  nodes  of  G  and  require  that  all  cells  act  simultaneously,  always  per¬ 
forming  their  next-move  functions  in  the  same  unit  interval  of  time,  then 
the  subsequent  configurations  of  symbols  within  G  will  be  well  defined. 
Figure  8—7  illustrates  this  formalization  for  parallel  process,  which  we 
shall  refer  to  as  an  Abelian  machine  space  (ams). 

Our  proverbial  outside  observer,  watching  the  operation  of  a 
given  AMS,  could  choose  to  concentrate  either  on  the  flow  of  symbols 
throughout  its  space  G  or  on  the  changing  of  the  states  of  its  Turing- 
machine  cells.  In  this  model,  then,  a  cell  is  distinct  from  a  square  or 
node  of  the  space  and  is,  rather,  identified  with  a  possibly  shifting  set 
of  “interdependent  positions”  in  the  space. 

There  are  several  different  ways  to  go  about  solving  the  problem 
of  what  will  happen  if  two  or  more  tapeheads  (possibly  from  the  same 
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Figure  8-“7.  Abelian  machine  space. 

cell  or  possibly  not)  attempt  to  print  different  symbols  on  the  same 
square  during  the  same  unit  time-interval.  One  way  is  to  decide  the 
actually  printed  symbol  on  the  basis  of  a  dominance  relation  on  the 
total  set  of  tapeheads. 

Another  simple,  rather  elegant  way  to  solve  this  “conflict  of  print 
commands”  problem  is  to  stipulate  that  the  total  set  of  all  symbols 
used  by  the  cells  of  the  ams  itself  forms  a  group,  under  the  operation  of 
superposition.  That  is,  the  symbols  are  designed  in  such  a  way  that  any 

®  See  Chapter  2. 
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sequence  of  printing  one  symbol  over  another  will  always  yield  a  new, 
recognizable  symbol  For  example,  we  might  use  the  four  symbols  - 
|,+,  and  ,  where  is  a  “splash  of  white  paint”  that  covers  any 
previous  symbol.  This  solution  to  the  conflict-of-print-commands  prob¬ 
lem  in  an  ams  does,  however,  require  that  the  set  of  symbols  form  an 
Abelian  group  under  the  operation  of  “instantaneous  superposition.” 
The  reason  for  this  is  that  there  is  no  “order”  to  the  superposition  of 
symbols  as  they  are  printed  within  a  given  unit  time-interval  by  different 
tapeheads;  it  has  been  assumed  that  all  the  cells  of  the  ams  carry  out 
the  operations  of  their  next-move  functions  simultaneously  within  each 
unit  time-interval.  Thus,  the  “instantaneous  superposition”  yx  must 
equal  the  instantaneous  superposition  xy  (see  Exercise  8~1). 

Both  ways  of  solving  the  conflict-of-print-commands  problem  can 
induce  a  partial  dominance”  relation  on  the  set  of  cells  in  an  ams, 
such  that  one  cell  dominates  another  insofar  as  it  prints  symbols  that 
override  those  printed  by  the  other.  This  can  induce  a  type  of  “long- 
range  dominance  on  cells.  In  an  ams  when  several  cells  scan  a  given 
square  they  are  each  affected  by  the  symbol  that  is  already  there.  When 
they  each  decide  to  print  their  respective  symbols  on  the  square,  their 
decisions  must  therefore  be  made  on  the  basis  of  each  cell’s  own  cur¬ 
rent  state  and  the  previous  symbols  printed  on  the  squares  of  the  space; 
the  transition  function  of  a  given  cell  does  not  depend  on  the  current 
states  of  the  other  cells.  However,  the  symbol  already  printed  on  a 
given  square  depends  in  general  upon  a  previous  application  of  the 
decision  rule  for  the  conflict-of-print-commands  problem.  Consequently, 
the  cells  with  the  greatest  long-range  effect  on  other  cells  (eventually 
have  their  output  observed  most  often  by  other  cells  and  consequently 
can  be  said  to  control  the  process  as  a  whole),  are  the  cells  that  are 
greatest  (if  there  are  any  greatest)  under  the  partial  dominance  relation 
on  cells  induced  by  the  decision  rule  for  the  conflict-of-commands 
problem. 

Whether  or  not  the  introduction  of  “long-range  dominance”  in 
this  sense  is  desirable  in  an  actual  construction  of  an  ams  would,  of 
course,  depend  on  the  application  one  has  in  mind  for  the  machine. 
One  way  of  solving  the  conflict-of-print-commands  problem,  which 
does  not  have  this  type  of  long-range  dominance,  is  to  specify  that 
each  square  of  space  be  associated  with  a  unique  cell  that  has  an 
immovable  tapehead  attached  to  that  square,  and  that  each  square  shall 
record  only  the  symbols  dictated  by  the  immovable  tapehead  that  scans 
it.  Thus,  all  the  moving  tapeheads  will  become  scanners.  (This  is  es¬ 
sentially  the  method  adapted  in  the  Holland  (1960)  iterative  circuit 
computers.) 
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QUESTIONS  OF  GENERALITY  AND 
EQUIVALENCE 

The  formalizations  for  parallel  process  so  far  discussed  have  pro¬ 
vided  a  major  context  within  which  mathematicians  (computer  scien¬ 
tists,  systems  analysts)  have  (to  date)  approached  the  subject  of  parallel 
processes  in  general.  Some  other  formalizations  for  “parallel  process 
have  been  suggested  by  Rodriguez  (1969),  Tesler  and  Enea  (1968), 
Luconi  (1968),  Martin  and  Estrin  (1969),  and  Dijkstra  (1965  et  seq,). 

The  reader  may  naturally  wonder  if  these  investigations  could  be 
carried  further:  Could  we  not  develop  a  formalization  for  parallel  proc¬ 
esses  in  which  the  basic  components,  or  cells,  of  a  given  process  are 
enabled  to  change  its  underlying  space? 

Such  a  formalization  can  be  developed,  but  in  fact  it  will  not 
be  any  more  general  than  that  provided  by  the  Abelian  machine  spaces. 
To  see  this,  let  us  consider  that  the  space  of  a  given  parallel  process 
is  represented  by  a  simple  structure,  of  the  sort  defined  in  Chapter  7. 
At  a  given  time  the  individual  cells  of  the  process  will  make  up  the 
space  of  the  process  by  existing  “in  relation”  to  one  another  so  as  to 
form  a  structure  (see  Fig.  8-8).  Presumably,  each  cell  will  be  able  to 
observe  those  cells  to  which  it  is  related  (which  form  its  “neighborhood 
structure”),  and  alter  its  neighborhood  structure  by  either  removing  or 
adding  relations  within  it.  It  is  not  too  difficult  to  arrange  a  consistent 
formulation  of  this  idea,  such  that  all  cells  operate  simultaneously, 
within  unit  time-intervals,  and  such  that  the  total  structure  (space)  of 
the  process  will  be  changed  with  time  by  its  cells.  However,  any  such 
self- affecting  space  (note  8—4)  can,  given  that  it  satisfies  certain 
finitistic  considerations,^  be  effectively  simulated  by  a  suitable  AMS.  We 
would  merely  require  that  some  of  the  squares  of  the  ams  be  used  to 
hold  a  current  description  of  the  given  space  structure  and  that  the 
polycephalic  cells  of  the  ams  be  designed  so  as  to  suitably  alter  that 
description;  the  underlying,  Abelian  space  of  the  ams  would  itself  not 
change. 


^  For  example,  each  cell  should  be  describable  as  a  finite  automaton  or 
Turing  machine;  each  neighborhood  structure  should  be  finite,  etc.  A  good  way 
to  implement  these  self-affecting  spaces  might  be  to  construct  “PLANNER-spaces,” 
in  which  the  relations  between  certain  nodes  or  collections  of  nodes  would  be 
controlled  by  planner  theorems,  each  theorem  controlling  its  own  collection  of 
nodes  and  operating  in  parallel  with  the  others.  There  would,  of  course,  have  to 
be  a  special  procedure  for  resolving  conflicts  of  commands.  (See  Hewitt,  1970, 
section  4. 6. 1.1. 2.) 
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Figure  8-8.  Space-structures. 
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In  light  of  this  conclusion,  one  might  naturally  wonder  whether 
the  AMS  formalization  is  really  more  general  than  that  of  the  abc  s. 
Can  any  Abelian  machine  space  be  simulated  by  an  abc?  The  answer  to 
this  question  is  both  yes  and  no,  depending  on  what  meaning  is  at¬ 
tached  to  the  concept  “simulation.”  There  are  (at  least)  two  equally 
valid  ways  of  interpreting  the  concept;  these  give  different  answers 
when  the  abc’s  and  the  ams’s  are  compared.  To  discuss  these  interpre¬ 
tations,  observe  first  that  both  abc’s  and  ams’s  are  examples  of  finitely 
describable,  effectively  computable  functions.  Any  given  abc  or  ams  is 
in  essence  a  function  that  maps  the  set  C  of  configurations  (of  states 
and  symbols)  which  are  possible  in  its  underlying  space  into  that  same 
set  C.  By  “effectively  computable”  is  meant  that  the  configuration 
produced  by  a  given  abc  or  ams  after  any  finite  amount  of  time  of  its 
operation  can  be  calculated  to  any  finite  extent  (i.e.,  for  any  finite 
number  of  squares  in  the  underlying  Abelian  space)  by  a  suitably  pro¬ 
grammed,  universal  Turing  machine  (note  8-5). 

We  can  show,  however,  that  some  abc’s  are  computation  universal, 
in  the  sense  that  such  an  abc  can  be  programmed  to  carry  out  the 
computation  performed  by  any  given  Turing  machine.  Thus,  the  opera¬ 
tion  of  any  given  ams  can  be  effectively  computed  by  a  suitable  abc.  In 
this  sense,  the  ams’s  are  not  more  general  than  the  abc’s,  and  can  be 
“effectively  simulated”  by  them. 

To  prove  that  there  are  computation-universal  abc’s,  it  is  sufficient 
to  show  that,  for  any  given  Turing  machine  T,  there  is  an  abc  that  can 
carry  out  the  computation  it  performs  on  any  given  input  tape.  This 
immediately  implies  the  result  that  there  are  abc’s  which  can  carry  out 
the  computation  of  any  given  universal  Turing  machine  on  any  given 
input  tape,  that  there  exists  a  single  abc  which,  given  a  suitable  initial 
configuration,  will  carry  out  the  computation  of  any  given  Turing  ma¬ 
chine  on;  any  given  tape. 

Following  is  an  outline  of  the  proof  of  A.  R.  Smith  (1969),  to 
which  the  reader  should  refer  for  more  details. 

Let  r  be  a  Turing  machine,  utilizing  m  symbols  and  n  states.  We 
can  construct  an  abc  that  will  carry  out  the  computation  performed 
by  T  on  any  given  input  tape,  such  that  Tt  uses  max  (n  +  1,  m  +  1) 
states,  an  infinite  two-dimensional  Cartesian  grid,  and  the  neighborhood 
corresponding  to  the  generator  set 

G^  =  {(0,1),  (1,0),  (-1,0),  (-1-1),  (0,-1),  (1,-1)) 
(See  Fig.  8-9.) 

Each  cell  of  g  has  a  set  of  M  +  max(m  +  1,  n  +  1)  states,  Q  = 
{0,1,  ...  ,M  -  1),  which,  “depending  on  context,”  are  used  to  represent 
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Figure  8-9.  The  neighborhood  about  a  cell  g. 

either  the  states  or  the  symbols  of  the  Turing  machine  T.  The  state  0 
is  the  quiescent  state  of  however,  and  is  not  used  to  represent  either 
a  state  or  a  symbol  of  T.  The  blank  symbol  h  of  T  is  to  be  represented 
in  Fy  by  the  state  1 .  In  general,  the  state  qi  of  T  is  to  be  represented  in 
Fy  by  the  state  /  +  1,  and  the  symbol  Xj  of  T  is  to  be  represented  in  Fy 
by  the  state  /  +  1;  that  is,  b  -  xq. 

To  simulate  the  computation  of  T  for  a  given  finite  input  string  i, 
that  string  is  embedded  in  a  row  of  the  space  of  Fy,  each  symbol  of  the 
string  i  being  represented  by  a  corresponding  cell  state  in  the  row; 
the  control  and  tapehead  of  T  are  represented  by  the  single  cell  above  the 
leftmost  end  of  the  row,  being  placed  in  state  1,  corresponding  to  the 
initial  state  qo  of  T  (see  Fig.  8—10).  All  other  cells  in  Fy  are  initially 
given  state  0.  At  any  subsequent  time  t,  the  configuration  of  the  Turing 
machine  T  will  be  represented  by  a  finite  row  of  cells  in  nonzero  state, 
above  which  there  is  a  single  cell  in  nonzero  state. 

Figure  8-1 1  then  gives  the  basic  design  of  the  transition  function  / 
used  by  the  cells  of  Fy,  corresponding  to  the  next-move  function  P  of  J. 
Nonzero  states  in  the  table  are  represented  by  the  dummy  symbol  s  or 
by  explicit  variables:  i  -f-  1  represents  state  qi  and  /  +  1  represents 
symbol  Xj,  etc.  Figure  8—1 1  shows  what  will  happen  for  all  the  various 
cells  of  Fy  during  any  unit  time-interval,  provided  T  is  in  state  qi  scan- 
ning  symbol  Xj  and  the  next-move  function  P  contains  the  quintuple 

qiXjXjcXqi 
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Figure  8-10.  An  initial  configuration  for  Tr. 

where  Xe{L,0,R}.  The  bottom  two  entries  in  Fig.  8-11  show  that 
Tj.  grows  the  tape  on  which  it  performs  its  computation  at  the  same 
time  the  computation  proper  is  being  carried  out.  (Any  neighborhood 
state  configuration  N{g)  not  shown  in  the  figure  is  defined  to  produce 
no  change  in  state  for  cell  g.) 

The  conclusion,  again,  is  that  the  operation  of  any  given  ams  can 
be  computed  to  any  finite  extent  by  a  suitable  abc.  However,  in  general, 
a  universal  Turing  machine  U  given  an  input  tape  (dfd)  requires  longer 
to  compute  the  result  (TfO)  than  does  the  machine  T,  given  the  input 
tape  i.  So,  this  suggests  another  question,  that  of  whether  the  operation 
of  any  given  ams  can  be  computed  completely  (at  a  constant  speed 
ratio  to  that  of  the  ams  itself)  by  a  suitable  abc.  Such  an  abc  would 
constitute  a  “complete  simulation”  for  the  ams.  The  answer  to  this 
question  is  no,  subject  to  our  current  definition  of  finite-state  automata 
within  an  abc;  that  is,  we  have  so  far  required  that  all  cells  within  an 
ABC  operate  simultaneously  within  unit  time-intervals.  It  is  not  possible, 
in  particular,  for  a  given  eell  to  operate  instantaneously  at  the  beginning 
of  a  unit  time-interval  and  thus  pass  information  with  “zero  delay 
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N(g)  at  time  T 


State  of  g  at  time  t+1,  given 


that  X  j  X  k  X  Pj  €  P 


k  +  1 


I  +  1  if  X=R;  0  otherwise 


I  +  1  if  X=0;  0  otherwise 


1  +  1  if  X=L;  0  otherwise 


1 


1 


Figure  8-11.  Basic  design  of  transition  function  f. 


Parallel  processing  and  evolutionary  systems  ^ 

between  unconnected  cells.  In  other  words,  there  is  a  limit  to  the  speed 
at  which  information  concerning  one  part  of  the  space  of  an  ABC  can  be 
carried  by  its  cells  to  another  part. 

This  limit  to  the  speed  of  information  transfer  in  an  ABC  can  be 
used  to  show  (Holland,  1970)  that  the  abc’s  are  not  composition- 
universal:  There  does  not  exist  an  abc  that  can  be  used  to  compute,  at 
a  constant  rate,  the  sequence  of  configurations  of  any  ams,  or  even  of 
any  abc.  If  an  abc  is  being  used  to  reproduce  the  successive  configura¬ 
tions  of  another  abc  or  ams,  it  must  in  some  cases  require  an  increas¬ 
ingly  longer  and  longer  amount  of  time  to  do  so,  even  for^  finite  con¬ 
figurations;  there  is  no  abc  that  can  simulate  all  abc’s  in  “real  time,’ 
or  even  at  a  slower  but  still  constant  rate. 

The  AMs’s  are  composition-universal  (and  thus-  cannot  be  com¬ 
pletely  simulated  by  the  abc’s)  because  the  tapeheads  of  a  given  AMS 
cell  are  allowed  to  transmit  information  with  zero  delay  across  varying 
distances  of  the  underlying  space.  One  can  also  modify  the  formalization 
for  cellular  automata  to  yield  abc’s  that  are  composition-universal:  The 
modification  consists  precisely  in  allowing  some  cells  to  carry  out  their 
transition  functions  instantaneously,  whenever  they  are  in  certain  states, 
at  the  beginning  of  the  unit  time-intervals  that  occupy  the  operations 
of  the  other  cells.  Such  instantaneously  acting  cells  (“Mealy  automata”) 
are  said  to  form  zero-delay  gates  for  information  transfer  (note  8—6). 

In  summary,  the  two  notions  of  simulation,  referred  to  here  as 
effective  and  complete,  correspond  to  two  types  of  universality:  compu¬ 
tation  universality  and  composition  universality.  Both  concepts  of 
universality  are  of  relevance  to  the  study  of  self-affecting  systems.  We 
shall  find  that  computation  universality  in  a  given  abc  irnplies  the  ability 
of  that  automaton  to  hold  a  self-reproducing  configuration,  which  is  it¬ 
self  equivalent  to  a  universal  Turing  machine;  also,  it  seems  very  likely 
that  the  composition-universal  spaces  are  those  best  suited  to  modeling 
evolutionary  systems. 


SELF-AFFECTING  SYSTEMS: 
SELF-REPRODUCTION 

A  mathematical  system  that  “affects  itself”  is  typically  composed  of 
at  least  two  parts,  A  and  B,  which  bear  the  relation  that 

A  affects  B 

and 


B  affects  A 
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The  entire  system  {A,B)  is  then  called  self-affecting  if  the  actions  of 
any  part  affect  the  other  parts,  which  in  turn  affect  the  original  part 
(note  8-7).  *  ^ 

Equivalently,  the  study  of  self-affecting  systems  is  the  study  of 
machines  that  produce  and  accept  feedback  to  themselves.  This  view¬ 
point  of  self-affecting  systems  lends  itself  to  a  study  of  continuously 
self-affecting  systems,  via  analytic  function  theory,  a  direction  of  re¬ 
search  presented  in  N.  Wiener  (1948)  and  Formby  (1965).  For  our 
own  purposes  it  is  adequate  to  stick  to  the  descriptions  of  discrete,  self- 
affecting  processes  provided  by  automata  theory. 

Many  types  of  self-affecting  systems  can  be  studied  within  the 
contexts  of  cellular  automata  theory  and  the  theory  of  Abelian  machine 
spaces  (see  Exercise  8-2).  Those  that  seem  to  be  of  particular  im¬ 
portance  to  the  field  of  artificial  intelligence  are  the  self -diagnosing  and 
self-repairing  systems,  the  self-reproducing  systems,  the  self -organizing 
systems,  and  the  evolutionary  systems.  Some  of  the  basic  qualities  of 
self-diagnosing  and  self-repairing  systems  are  illustrated  by  the  Ex¬ 
ercises  at  the  end  of  the  chapter;  for  thorough  discussions  on  the 
current  uses  of  such  systems,  see  Carter  (1971)  and  Randell  (1971). 
Self-organizing  and  evolutionary  systems  will  be  discussed  in  the  next 
section,  and  this  section  will  concentrate  on  the  nature  of  self- 
reproducing  systems. 

The  study  of  self-reproducing  systems  can  be  approached  from 
many  different  angles.  After  the  discussion  of  a  few  such  approaches, 
the  reader  should  investigate  the  vast  literature  for  himself:  von 
Neumann  (1966)  was  the  first  to  investigate  it  extensively,  using  cel¬ 
lular  automata  theory,  and  most  of  the  subsequent  approaches  are  due 
to  his  influence.  A  semi-intuitive  argument  of  von  Neumann’s  provides 
the  best  introduction  to  the  nature  of  self-reproducing  machines.  Let 
us  assume  that  there  exists  a  machine  A,  which  is  a  universal  constructor 
m  the  sense  that  if  A  is  given  a  finite  input  tape  describing  a  given  ma¬ 
chine  X,  A  will  eventually  construct  X.  This  is  denoted  by 


where  dx  is  an  input  tape  describing  X.  It  should  be  noted  that 

A-dx-^A  (8-2) 

is  not  an  example  of  self-reproduction,  since  after  the  process  8-2  is 
complete,  there  exist  two  A  machines  and  only  a  single  tape  dx,  which 
IS  not  specified  as  being  given  as  input  to  either  of  the  two  A  machines. 
Rather,  the  need  is  for  an  equation  of  the  form 

Y:  dy-^Y:  dy 


(8-3) 
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which  indicates  that  there  is  a  single  machine  Z  =  (Y-Jy)  such  that 

_ To  obtain  this,  we  need  two  other  machines,  both 

simpler  in  design  than  A.  The  first  of  these  is  a  machine  B,  which  is 
capable  of  copying  any  input  tape 

(After  process  8—4  is  complete,  there  will  exist  two  input  tapes  /. ) 
The  other  machine,  C,  is  to  be  capable  of  coordinating  the  actions 
of  A  and  B  so  that  the  ensemble  of  machines  A  +  B  +  C,  given  an  input 
tape  dx,  will  operate  as  follows: 

(A  +  B  +  C):  dx^  (B  +  C):  (A  :dx) 

(C +  A  +  X):  (B:dx)-^X:dx  '  (8-5) 

That  is,  C  first  submits  dx  to  A,  causing  A  to  produce  a  copy  of  X, 
then  C  submits  dx  to  B,  causing  B  to  copy  dx;  then  C  submits  the  copy 
of  dx  to  X  and  allows  X:  dx  to  operate  on  its  own.  Let  us  then  denote 
the  machine  (^  +  S  +C)  by  the  symbol  D;  the  result  follows  immedi¬ 
ately  that 

D:di>^D-.do  (8-6) 

Thus,  the  machine  E  =  (D:  dp)  is  self-reproducing.  The  reader  should 
note  that  there  are  no  logic  problems  with  this  argument,  and  that  the 
result  follows  directly  from  the  assumption  that  the  three  machines 
A,  B,  and  C  are  each  finitely  describable. 

Of  these  three  machines,  the  only  one  that  has  not  been  given 
an  effective  description  within  the  argument  is  A,  the  universal  con¬ 
structor;  that  is,  the  argument  describes  A,  but  not  sufficiently  to 
guarantee  that  it  can  actually  be  built.  At  the  suggestion  of  S.  Ulam,  von 
Neumann  (1966)  made  the  first  investigations  in  cellular  automata 
theory  in  an  attempt  to  prove  the  existence  of  a  universal  constructor. 
Although  he  died  before  he  could  finish  his  work,  he  did  prove  the 
existence  of  a  universal  constructor  (note  8—8),  using  a  two-dimensional 
ABC  of  29  states.  The  constructor  itself  was  effectively  described  and 
shown  to  occupy  about  200,000  scjuares  of  the  space.  Since  then,  others 
have  shown  that  the  size  and  number  of  states  required  for  a  universal 
constructor  can  be  considerably  reduced  (see  Codd,  1968). 

It  is  relatively  easy  to  show  that  there  are  abc’s  in  which  certain 
configurations  of  states  will  reproduce.  A  very  simple  example,  due  to  E. 
Fredkin,  uses  two  states — Q  =  {0,  1}  for  each  cell— the  (“von 
Neumann”)  neighborhood  corresponding  to  the  generator  set 

G"  =  {(1,0), (0,1), (-1,0), (0,-1)} 
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for  the  infinite  two-dimensional  Cartesian  grid,  and  the  following  transi¬ 
tion  function  f: 

1.  If  at  time  t,  g  is  connected  (by  G°,  under  vector  addition) 
to  an  even  number  of  cells  in  state  1,  then  at  time  f  -t-  1,  g 
will  be  in  state  0. 

2.  If  at  time  t,  g  is  connected  to  an  odd  number  of  cells  in  state 
1,  then  at  time  t  -I-  1,  g  will  be  in  state  1. 

It  is  not  difficult  to  prove  that  any  finite  initial  configuration  of  I’s 
will  reproduce  itself  endlessly  in  this  arc.  Figure  8—12  shows  a  sequence 
of  self-reproductions  of  a  “right  tromino.” 


Figure  8-12.  The  self-replications  of  a  “right  tromino.”  (From  “Mathe¬ 
matical  Games”  by  Martin  Gardner.  Copyright  ©  1971  by  Scientific 
American,  Inc.  All  rights  reserved.) 


From  the  standpoint  of  automata  theory  (and  artificial  intelli¬ 
gence),  it  is  important  to  search  for  a  more  general  type  of  self¬ 
reproduction.  The  need  is  for  an  abc  in  which  there  is  a  configuration 
that  reproduces  and  which  can  also  carry  on  some  type  of  universal 
processing  activity.  Thus,  we  have  another  reason  for  von  Neumann’s 
motivation  to  show  the  existence  of  a  universal  constructor.  (Fredkin’s 
ABC  mentioned  above  is  not  capable  of  holding  a  universal-constructor 
or  universal-computer  configuration.)  Rather  than  reproduce  a  lengthy 
universal-construction  proof,  it  is  sufficient  merely  to  summarize  A.  R. 
Smith’s  (1969)  proof  that  there  exist  abc’s  that  can  hold  self- 
reproducing,  computation-universal  configurations. 

The  preceding  section  showed  that  there  exist  arc’s  that  are 
computation-universal.  Such  an  abc,  given  an  initial  finite  configuration 
corresponding  to  the  machine-tape  pair  will  carry  out  the  compu¬ 

tation  of  T,  given  the  input  tape  i.  Suppose  that  T  given  i  yields  the 
(finite)  output  string  /,  which  is  denoted® 

(8-7) 

®The  notation  of  formulas  8-7  through  8-11  is  similar  to,  but  not  to  be 
confused  with,  that  of  formulas  8-1  through  8-6. 
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Similarly,  if  r  is  a  universal  abc,  we  denote  its  operation  on  (dr.i)  by 

(.dT,i)rJ 

Finally,  if  the  tapehead-control  cell  of  r  is  scanning  the  leftmost  square 
of  a  finite  row  x,  we  write 

,  (S-9) 

t 

To  establish  our  result,  we  shall  need  to  use  the  “fixed  point”  recursion 
theorem. 

THEOREM.  For  any  total  recursive  function  h  mapping  programs 
into  programs,  there  exists  a  program  P  such  that  h{P)  =  F. 

(A  function  is  said  to  be  total  if  it  is  defined  for  all  elements  of  its 
domain;  a  function  is  recursive  if  it  is  expressible  as  a  Turing  machine 
program.) 

LEMMA  8-1.  For  any  arbitrary  encoding  function  d  and  for  any 
arbitrary  partial  recursive  function  g,  there  exists  a  program  P 
such  that 

i  r  ddp,  i),j,  {dp,  0)  r  {{dp,  i),j,  {dp,  i),j,  {dp,  0)  ?  •  ■  •  (8-10) 

ft  * 

if  g(/)  “  j  is  defined.  (P  is  said  to  be  self-describing.) 

Sketch  of  Proof.  We  can  define  a  function  h  from  programs  to  pro¬ 
grams  such  that 

(a)  i  0j7j 

(b)  {dQ^  0  h^Q)  ((^Qj  Ofji  (^^5 

This  can  be  done  because,  given  that  h(Q)  is  in  its  initial  state  scanning 
the  leftmost  symbol  of  a  string  x,  it  can  always  decide  whether  x  is  of 
the  form  (dQ,  i)  for  some  i  (it  knows  the  function  Q  and  d;  therefore 
it  can  compute  dq,  compare  it  to  the  leftmost  part  of  x,  etc.).  If  x  is 
not  of  this  form,  h  can  be  designed  to  perform  step  (a),  which  consists 
basically  of  setting  /  =  jc,  computing  {dqd),  computing  g(/)  =  j,  copy¬ 
ing  {dqd),  and  going  into  its  halt  state  (=its  initial  state)  at  the  proper 
place.  Step  (b)  requires  all  but  the  first  two  parts  of  step  (a).  How¬ 
ever,  the  fact  that  is  a  total  recursive  function  (which  follows  from 
the  nature  of  an  encoding  function;  see  Chapter  2)  implies  that  h  is 
total  recursive.  So,  by  the  recursion  theorem  stated  above,  we  know 
there  must  exist  a  program  P  such  that  /i(P)  =  P  and 

i  7"  {{dp,  i\j,  {dp,  0)  7^  {^p,  0  {^p.  0)  7^ .  •  • 

t  t  ,  t  . 

thus  concluding  our  proof. 
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THEOREM  8-1.  Let  r  be  a  computation-universal  abc.  There 
exists  a  finite  configuration  co  of  r  which  is  self-reproducing  and 
computes  an  arbitrary  partial  recursive  function  g. 

Proof.  Let  Co  be  .  Then 

(dp,  i)  (dp,  i),j,  (dp,  i))  (8-11) 

=  ^(0  is  defined. 

COROLLARY  8-1.  If  T  is  a  computation-universal  abc,  then 
there  exists  a  finite  configuration  Co  of  r  which  is  self-reproducing  and 
computation-universal. 

Proof.  Let  g  be  the  universal  Turing  machine  function. 

The  existence  of  self-describing  machines  is  more  than  a  theoretical 
result  of  automata  theory.  Thatcher  (1963)  gave  an  explicit  2532- 
instruction  program  for  a  self-describing  machine;  thus,  it  is  possible 
for  programs  to  reproduce  themselves  inside  a  computer. 

Myhill  (1964)  investigated  self-reproducing  machines  from  a 
recursion- theoretic  viewpoint  also,  although  his  results  were  not  con¬ 
cerned  specifically  with  cellular  automata.  The  principal  result  of  his 
studies  was  that  there  exists  a  sequence  of  self-improving  machines 
M0M1M2, . . . ,  such  that  each  machine  constructs  the  next  one.  The 
machines  are  improvements  over  each  other  in  the  following  respect: 
The  first  machine  effectively  proves  all  decidable  propositions  in  a 
given  recursive  axiomatization  of  arithmetic;  the  second  machine  uses 
an  expanded  recursive  axiomatization  of  arithmetic  and  effectively 
proves  all  the  decidable  propositions  in  its  own  axiomatization,  in¬ 
cluding  some  that  are  undecidable  in  the  axiomatization  of  the  first; 
die  third  machine  does  the  same  for  the  second,  and  so  on  (see  Chapter 

The  self-improvement  of  these  machines  is,  however,  not  fully 
effective:  It  can  be  shown  that  some  propositions  of  arithmetic  are 
undecidable  for  every  machine  in  the  sequence;  there  is  no  (mathe¬ 
matically  describable)  sequence  of  consistent  machines  which  ef¬ 
fectively  decides  the  truth  or  falsity  of  every  proposition  of  arithmetic. 
Still,  Myhill  s  results  did  show  that  machines  cannot  only  reproduce 
themselves  but,  in  a  sense,  also  develop  themselves. 

The  reader  may  have  noted  by  this  time  that,  except  for  Fredkin’s 
ABC,  all  self-reproducing  systems  so  far  discussed  have  operated  in  a 
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highly  serial  manner,  despite  orientation  of  the  discussion  (at  least  in 
the  case  of  von  Neumann  and  Smith)  toward  cellular  automata  and 
parallel  processes.  This  is  in  fact  the  current  state  of  affairs  for  the  study 
of  self-reproducing  configurations  in  cellular  automata.  So  far  as  the 
present  author  is  aware,  no  one  has  as  yet  demonstrated  an  abc  con¬ 
figuration  that  is  a  universal  computer  and  which  self-reproduces  in  a 
highly  parallel  manner. 

The  problem  seems  to  be  much  easier  to  deal  with  in  the  Abelian 
machine  spaces,  where  we  can  obtain  parallel  universal  computation 
rather  trivially,  merely  by  requiring  that  all  polycephalic  cells  of  the 
AMS  be  universal  computers.  This  also  enables  them  to  generate  pro¬ 
grams  for  each  other  and  to  program  each  other.  We  might  then  specify 
that  all  cells  of  the  ams  initially  have  blank  input  tapes,  except  for  the 
one  cell  Co,  whose  input  tape  contains  the  description  dp  for  a  self¬ 
describing  universal  program  F,  which  contains  within  it  a  description 
dv  for  a  finite  cellular  automaton  T;  and  contains  within  it  a  description 
for  an  “activated  portion”  of  dr- 

The  nature  of  Co,  given  dp,  is  that  Co  will  print  “subactivations” 
of  dp  on  the  input  tapes  of  two  cells  (say,  Ci  and  C2)  and  erase  the  input 
tape  of  Co.  By  a  subactivation  of  dp  is  meant  a  new  description  d' p, 
which  is  identical  with  dp  except  for  its  reference  to  an  “activitated  por¬ 
tion”  of  dvl  the  activated  portion  of  dr  described  in  d'p  should  be 
contained  within  the  activated  portion  of  dr  described  within  dp.  The 
process  is  to  be  carried  in  a  similar  manner  down  the  levels  of  activa¬ 
tion  allowed  in  dr,  with  the  end  result  being  that  instead  of  Co  (the 
“fertilized  egg”),  there  will  be  a  set  of  cells  {C^},  each  “activated  to 
be  a  single  cell  of  r,  and  all  connected  together  within  the  space  of  the 
AMS  by  their  tapeheads  so  as  to  form  the  cellular  automaton  T.  The 
construction  is  complete  if  we  design  r  to  be  able  to  program  the 
original  dp  (specifying  complete  activation  of  ^r)  into  a  cell  of  the 
AMS.  Then  r  will  be  a  finite  automaton  that  reproduces  in  a  highly  paral¬ 
lel  manner;  giving  it  universal-computing  ability  would  probably  not  be 
too  difficult. 

Again,  this  construction  has  not  been  rigorously  formalized;  how¬ 
ever,  there  is  no  essential  mathematical  difficulty  in  proving  the  existence 
of  a  dp  thdit  will  behave  in  the  manner  indicated  above,  activating  dif¬ 
ferent  portions  ot  dr  as  required  and  programming  itself  into  some  of  the 
unprogrammed  cells  of  the  ams.  The  most  difficult  problem  in  achieving 
such  a  self-reproducing  automaton  is  probably  that  of  attaining  the 
proper  coordination  into  a  single,  universal  automaton  of  the  cells  that 
descend  from  Co.  This  author  suspects  that  even  this  can  be  solved  in  a 
relatively  simple  way. 
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The  final  sections  of  this  chapter  leave  formalities  aside  and  merely 
speculate  on  the  usefulness  of  such  results  to  artificial  intelligence. 

HIERARCHICAL,  SELF-ORGANIZING,  AND 
EVOLUTIONARY  SYSTEMS 

Conditions 


This  section  briefly  describes  some  of  the  ways  in  which  large, 
multiprocessing  systems  may  eventually  be  used  in  ai  research.  The 
emphasis  is  particularly  on  “hierarchical,”  “self-organizing,”  and 
evolutionary  ’  systems.  Before  beginning,  the  reader  should  be  warned 
that  there  are  still  no  comprehensive  theories  or  definitions  for  the 
nature  of  these  systems  (especially  for  the  latter  two).  Rather,  there  are 
a  number  of  partial  results  and  guidelines,  primarily  concerned  with 
hierarchical  systems.  And  the  little  experience  so  far  obtained  with  self- 
organizing  and  evolutionary  programs  has  been  largely  disenchanting. 

Nevertheless,  it  is  the  present  author’s  belief  that  these  systems 
may  eventually  be  very  valuable  to  ai  researchers,  provided  two  con¬ 
ditions  can  be  satisfied: 

1.  First,  there  is  a  hardware  requirement.  These  systems  may  in¬ 
volve  rather  sizable  complexes  of  computers;  and  it  would  be  good  if 
they  were  inexpensive. 

2.  Second,  we  must  overcome  the  misconception  that  these  systems 
are  essentially  incompatible  with  the  “reasoning-program”  approach  (see 
Chapter  3),  and  begin  to  investigate  the  possibilities  of  “hybrid” 
(hierarchical,  self-organizing,  evolutionary  and  reasoning)  programs. 

While  it  is  not  the  purpose  of  this  book  to  discuss  hardware,  there 
are  encouraging  signs  in  that  field  of  computer  science.  For  example. 
Culver  and  Mehran  (1971)  suggested  that  the  use  of  laser  technology 
may  eventually  allow  a  computer  to  perform  a  logic  operation  in  a 
time  span  on  the  order  of  picoseconds  (10""^  seconds);  holographic 
storage  techniques  (again,  “laser  technology”)  may  eventually  make  it 
possible  for  computer  memories  to  store  millions  of  bits  of  information 
per  square  inch  (Hunt,  Elser,  and  Wolf,  1970).  At  any  rate,  we  can 
ignore  the  hardware  condition  and  try  to  assume  within  reasonable 
bounds  that  it  can  be  met  successfully.  This  section  is  intended  to  show 
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that  the  second  condition  can  be  met  (that  is,  to  suggest  some  “hybrid” 
programs  that  ai  researchers  may  eventually  investigate  with  profit), 
and  possibly  to  restore  some  enchantment  to  the  study  of  self-organizing 
and  evolutionary  systems. 

Hierarchical  Systems 

Many  types  of  “hierarchical”  systems  have  been  encountered 
throughout  this  book.  In  particular,  the  reader  may  recall  the  discus¬ 
sion  of  PLANNER  (Chapter  6),  the  “hierarchy  of  visual  perception  sys¬ 
tems”  described  in  Chapter  5,  “hierarchies  of  languages”  discussed  in 
Chapter  7,  and  the  “economy  of  invention”  hierarchy  suggested  in 
Chapter  3.  In  general,  a  hierarchical  system  is  an  ordered  collection  of 
machines  (systems,  programs,  procedures,  processes,  etc.).  We  may 
speak  of  the  type  of  “order”  involved  as  determining  the  “form”  and 
the  “nature”  of  the  hierarchy,  which  may  be  different  for  different 
hierarchies.  The  form  of  most  systems  that  are  considered  to  be  hierar¬ 
chical  corresponds  to  either  a  string,  a  tree,  a  lattice  (see  Fig.  8-13)  or 
perhaps  to  some  cyclical  variation  on  these  forms.  The  nature  of  a 
hierarchical  system  corresponds  to  the  physical  meaning  of  the  order¬ 
ing  between  its  machines,  the  factors  of  which  may  include  time,  energy, 
composition,  construction,  information,  and  control.  These  factors  may 
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Figure  8-13.  (a)  String,  (b)  tree,  (c)  lattice. 
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be  explained  by  noting  that  machines  operate  in  time,  transform  energy, 
may  be  made  up  of  other  machines,  may  make  other  machines,  may 
process  information  and  send  it  to  other  machines  or  people,  and  may 
control  the  behavior  of  other  machines  (by  programming  them,  altering 
their  environments,  starting  them,  unplugging  them,  etc.). 

Following  are  two  brief  examples  of  ways  the  “hierarchical  sys¬ 
tems”  concept  is  of  use  to  computer  science  and  artificial  intelligence. 

EXAMPLE  8-2.  MEMORY  SYSTEMS.  As  mentioned  in  Chapter  2, 
a  memory  system  is  a  means  of  storing  and  retrieving  informa¬ 
tion  (data  structures),  and  may  typically  be  described  by 
reference  to  its  qualities  of  size  (number  of  bits  of  information  it 
can  store)  and  access  time  (time  necessary  to  determine  the 
bits  held  at  a  particular  place  of  storage  in  the  memory).  For  a 
given  memory  system  these  qualities  are  directly  related:  The 
larger  the  memory  size,  the  greater  is  the  access  time.  One  of 
the  earliest  hierarchical  systems  investigated  by  computer  sci¬ 
entists  (it  was  suggested  by  von  Neumann)  is  the  hierarchical 
memory  system.  Its  value  results  from  the  fact  that  the  utility 
of  a  given  data  structure  varies  with  time  :  When  a  data  structure 
is  being  used  by  (or  as)  a  program,  it  has  high  utility,  whereas 
otherwise  its  utility  is  very  low,  corresponding  to  the  probability 
with  which  it  may  be  used  in  the  future.  A  hierarchical  memory 
system  consists  of  a  lattice  of  memory  systems,  each  capable  of 
supplying  data  structures  to,  or  accepting  data  structures  from, 
its  parents:  The  highest  member  of  such  a  system  is  the  “core 
memory,”  used  by  the  computer  to  store  the  data  structures  it 
is  currently  using;  other  members  of  a  typical  system  may  be  a 
magnetic  “disk”  or  “drum,”  a  magnetic  tape  system,  and  per¬ 
haps  a  holographic  storage  system.  The  core  memory  may 
hold  10®  bits,  with  an  access  time  on  the  order  of  microseconds, 
while  the  holographic  storage  system  may  hold  10'"  bits,  with  an 
access  time  on  the  order  of  seconds  (see  Katzan,  1971;  Gentile 
and  Lucas,  1971;  and  Arora  and  Gallo,  1971). 

EXAMPLE  8—3.  planner’s  HIERARCHICAL  CONTROL  SYSTEM. 
Chapter  6  gave  a  brief  description  of  planner,  a  programming 
language  for  writing  plans  (Hewitt,  1968  et  seq.).  A  plan  written 
in  PLANNER  consists  of  a  collection  of  theorems  that  represent 
procedures  for  manipulating  assertions.  When  a  theorem  (pro¬ 
cedure)  is  used,  it  may  affect  other  procedures  (create  them  or 
manipulate  them)  or  it  may  “call”  another  procedure  (i.e.,  cause 
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that  procedure  to  be  used).  When  a  theorem  is  being  used,  we 
may  think  of  it  as  having  “control”  of  the  actions  currently  being 
taken  by  the  computer,  and  when  it  calls  another  theorem,  we 
may  think  of  it  as  transferring  control  to  that  theorem.  The  im¬ 
plementation  of  a  plan  starts  by  calling  one  of  its  theorems  and 
continues  as  theorems  manipulate  data  structures,  transfer  con¬ 
trol,  etc.  Theorems  in  planner  are  goal-directed  procedures: 
Their  purpose  is  generally  to  establish  something  as  a  fact. 
Consequently,  the  way  in  which  they  transfer  control  may  be 
“conditional”:  Theorem  A  may  transfer  control  to  theorem  B; 
if  theorem  B  (and  those  theorems  to  which  it  transfers  control) 
fails  to  achieve  its  goal,  control  automatically  backs  up  to 
theorem  A  so  that  it  can  (hopefully)  do  something  else. 
planner’s  hierarchical  control  structure  enables  it  to  keep 
track  of  the  hierarchy  of  theorems  being  called  and  transferring 
control  caused  by  the  implementation  of  a  plan. 

As  Holland  (1970)  pointed  out,  the  chief  value  of  the  “hierar¬ 
chical  systems”  idea  is  that  it  gives  a  way  of  describing  large  systems 
that  is  far  more  practical  than  the  “state-transition”  function  approach 
of  automata  theory.  A  large  system  (e.g.,  the  human  brain)  may  have 
lO""  components;  if  each  component  has  two  states,  the  system  will 
have  2'“''  possible  states,  and  an  explicit  description  of  the  state-transi¬ 
tion  function  for  a  system  of  this  size  is  not  possible.  Yet  it  may  be  that 
the  components  of  the  system  are  organized  into  a  hierarchy  of,  say, 
11  levels  of  “blocks,”  in  which  each  block  is  divided  into  10  lower-level 
blocks  (a  tree  with  a  branching  factor  of  10  and  a  depth  of  11),  the 
lowest-level  blocks  being  the  “components”  of  the  system.  Should  this 
be  the  case,  then : 

Even  for  a  device  with  as  many  as  10^°  components,  one  need  only 
make  a  selection  at  each  of  10  levels  to  uniquely  locate  any  given 
component.  And,  assuming  a  relevant  functional  division,  much  will 
be  learned  of  the  effect  of  that  component  by  observing  the  use  or 
function  of  the  blocks  involved  .  .  .  For  devices  of  this  complexity, 
hierarchical  descriptions  offer  almost  the  only  avenue  to  detailed 
understanding.  (Holland,  1970.) 

The  reader  who  wishes  to  pursue  the  subject  of  hierarchical  descrip¬ 
tions  is  also  encouraged  to  see  van  Emden  (1970)  and  Pratt  ( 1969a, b). 
Mesarovic,  Macko,  and  Takahara  (1970)  presented  a  rather  compre¬ 
hensive  treatment  of  general  hierarchical  systems;  Miller,  Galanter,  and 
Pribram  (1960)  presented  an  early,  but  still  not  superseded,  discussion 
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of  the  hierarchical  nature  of  human  plans;  Gertz  (1970)  discussed 
hierarchical  associative  memory  systems  for  parallel  processing;  Fikes 
and  Nilsson  (1971)  discussed  the  hierarchical  nature  of  the  systems 
used  in  the  robot  strips. 


Self-Organizing  Systems 

A  collection  of  organs  performing  functions  in  relation  to  each 
other  is  called  an  organism;  similarly,  a  collection  of  people  who  solve 
problems  or  perform  functions  in  relation  to  each  other  is  often  called 
an  organization^  especially  if  the  relations  and  functions  involved  seem 
to  be  relatively  unchanging. 

By  a  self-organizing  system  (sos)  is  meant  a  collection  of  machines 
capable  of  solving  problems  by  forming  into  (perhaps  temporary)  or¬ 
ganizations.  Essential  to  the  operation  of  any  self-organizing  system  is 
an  environment,  or  collection  of  problems  to  be  solved  and  patterns  to 
be  recognized  (see  Chapter  3).  Two  paradigms  for  the  concept  of  “self¬ 
organizing  system”  are  suggested. 

The  first  is  the  standard  paradigm,  which  received  a  great  deal  of 
investigation  in  the  early  1960s  (see  Yovits  and  Cameron,  1960;  Yovits, 
Jacobi,  and  Goldstein,  1962).  The  viewpoint  of  this  paradigm  is  to 
see  a  self-organizing  system  as  made  up  of  parts,  one  of  which  is  a 
control  and  governs  the  organization  of  the  other  parts  (often  called 
generators,  characteristic  functions,  predicates,  parametric  functions, 
etc.);  these  parts  begin  with  some  initial  (possibly  empty)  set  of  rela¬ 
tions  to  each  other.  The  function  of  control  is  to  modify  that  set  with 
experience,  to  make  a  structure  of  the  parts  which  reflects  its  current 
knowledge  and  inference  about  the  environment.  (Note  that  “control” 
might  be  an  “imaginary  part”  in  reality  distributed  among  the  other 
parts.)  Thus,  for  example,  a  Perceptron  begins  with  a  structure  of 
predicates,  and  it  alters  that  structure  by  changing  the  “weight  relation” 
between  its  predicates  (see  Chapter  5 ) .  In  general,  the  structure  of 
parts  produced  by  the  control  of  an  sos  at  any  given  moment  is  to  be 
interpreted  as  the  sos’s  current  strategy  for  solving  the  problems  pre¬ 
sented  by  its  environment.  Thus,  we  have  already  encountered  several 
simple  types  of  self-organizing  system:  Pandemonium,  the  Samuel 
Checkers  Player,  and  the  Waterman  Poker  Player  are  all  basic  examples 
of  programs  designed  to  modify  structures  (i.e.,  “organize  themselves”) 
so  as  to  solve  a  problem. 

It  is  probably  fair  to  say  that  this  paradigm  has  gone  out  of  vogue.® 


®If  such  a  vogueless  expression  is  permissible. 
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The  reason  is  that  it  gives  few  guidelines  for  designing  self-organizing 
systems  with  an  aptitude  for  specific  real-world  environments,  leaving 
the  burden  of  finding  suitable  predicates  and  controls  to  the  human 
designer.  The  paradigm  is  somewhat  tautological  and,  as  a  conse¬ 
quence,  it  is  still  being  used,  but  many  researchers  no  longer  bother  to 
refer  to  it.  (Even  so,  the  collections  edited  by  Yovits  et  al.  (1962)  con¬ 
tain  many  insightful  papers,  and  are  worth  reading.)  Rather,  it  has  been 
replaced  by  a  general  interest  in  the  question  of  “machine  induction” 
(learning  of  evaluation  functions,  pattern  recognition,  etc.)  and  been 
pursued  in  three  directions:  statistical  decision  theory  (Duda  and  Hart, 
1973),  automatic  program  writing  (Chapter  6),  and  grammatical  in¬ 
ference  (Chapter  7). 

The  present  author  will  not  pretend  to  lift  the  burden  or  to  elimi¬ 
nate  the  tautology.  However,  it  is  possible  that  another  paradigm,  which 
emphasizes  a  different  aspect  of  self-organization,  may  stimulate  re¬ 
newed  interest  in  the  subject  and  eventually  lead  to  more  results.  The 
paradigm  suggested  here  is  to  see  a  self-organizing  system  as  having 
the  following  characteristics:  First,  it  will  consist  of  parts,  each  part 
being  a  problem-solving  device,  and  “control”  will  be  distributed 
throughout  each  of  the  parts;  second,  the  parts  of  a  self-organizing  sys¬ 
tem  will  share  a  language  capability  for  some  language.  Each  part  will 
be  able  to  communicate  with  other  parts  of  the  system,  and  the  actions 
of  each  part  will  be  influenced  by  the  messages  it  receives.  The  design  of 
a  self-organizing  system  should  focus  on  two  aspects :  the  nature  of  the 
individual  components  and  the  language  they  use  to  communicate  with 
each  other.  Let  us  give  an  example  of  a  way  in  which  this  kind  of  self- 
organizing  system  might  be  useful  in  the  real  world. 

EXAMPLE  8-4.  COMPUTER-DRIVEN  VEHICLES.  R.  A.  Schmidt 
(1971)  presented  convincing  arguments  that  if  an  automated 
system  for  the  transportation  of  people  by  automobile  is  de¬ 
veloped,  each  automobile  in  the  system  should  be  an  artificial 
intelligence,  with  its  own  computer  and  visual  perception  sys¬ 
tem.  His  basic  reasons  are  as  follows:  First,  the  automobile 
(call  it  an  “automatic  car”  or  a  “robot  chauffeur”)  should  be 
capable  of  being  introduced  into  the  existing  transportation  sys¬ 
tem  without  requiring  an  extensive  (and  expensive)  road¬ 
rebuilding  project.  Thus,  the  robot  chauffeur  should  be  capable  of 
traveling  over  ordinary  roads  and  highways,  just  as  people- 
driven  cars  do.  Second,  use  of  the  robot  chauffeur  should  be  as 
safe  as  the  use  of  ordinary  cars  (preferably  safer).  Finally,  there 
is  good  reason  why  an  automated  transportation  system  should 
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Still  be  (at  least  partially)  based  on  the  use  of  automobiles; 
namely,  the  use  of  automobiles  gives  people  a  greater  “freedom 
of  mobility”  than  seems  achievable  otherwise.  Other  systems 
require  the  use  of  terminals,  spaced  relatively  far  apart,  at  which 
people  may  enter  and  leave  the  transportation  system;  usually 
the  people  involved  may  expect  the  terminals  to  be  some  dis¬ 
tance  from  their  own  destinations.  “This  fact,  along  with  the 
nuisance  of  scheduling  present  in  other  systems,  is  what  induces 
most  people  to  use  automobiles,  in  spite  of  parking  problems, 
congestion,  delay  and  the  host  of  other  problems  involved” 
(Schmidt,  1971,  p.  139). 

Given  these  reasons,  Schmidt  noted  that  the  difficulty  involved  in 
automobile  transportation  systems  is  that  danger  areas  (“incidents”) 
are  highly  localized  and  variable.  They  may  be  caused  by  anything, 
ranging  from  a  child  running  across  a  street,  to  another  car  with  a  flat 
tire  or  an  erratic  driver,  or  a  hubcap  lying  in  the  road;  and  they  may  ap¬ 
pear  and  disappear  quickly.  Given  that  we  do  not  embark  on  an  exten¬ 
sive  project  of  building  new,  automated  roads  (and  such  a  road  system 
would  still  be  susceptible  to  incidents),  the  information  necessary  to 
discover  and  avoid  incidents  must  be  obtained  visually,  and  each  car 
must  contain  its  own  robot  chauffeur.^ 

The  present  author’s  suggestion  (to  illustrate  the  “communication” 
paradigm  for  self-organizing  systems)  is  that  it  would  be  desirable  for 
each  of  the  automatic  cars  to  be  capable  of  communicating  with  the 
others  in  its  area  (say,  on  a  special  communications  band)  in  a  language 
specifically  designed  for  expressing  information  about  danger  areas,  in¬ 
cidents,  roads,  automobiles,  etc.  Certain  kinds  of  automobiles  would 
have  the  ability  to  make  use  of  special  sentences  in  the  language.  Thus, 
an  ambulance  might  tell  other  cars  it  is  heading  along  a  certain  road, 
at  a  certain  speed  and  location,  toward  a  certain  destination;  a  car 
stalled  on  the  road  ahead  might  be  able  to  reply  with  a  warning  to  slow 
down.  Again,  suppose  an  accident  occurs  on  lane  1  of  a  highway  at 
point  A  (Fig.  8—14);  automobiles  traveling  on  lane  2  should  be  able 
to  relay  a  warning  message  back  to  automobiles  at  point  B  in  lane  1, 
telling  them  to  slow  down.  As  it  is  currently  done,  the  warning  message 
that  is  relayed  back  consists  of  the  brakelight  signals  emitted  by  the  cars 

^  Note  that  Schmidt’s  own  assessment  of  the  likelihood  of  achieving  safe 
robot  chauffers  is  pessimistic.  He  concludes  that  driving  a  car  requires  the  full 
intellectual  abilities  of  judgment  and  learning  possessed  by  a  human  being: 

.  .  future  research  in  computer  control  would  be  more  profitable  in  areas 
s^ch  as  industrial  automation,  remote  exploration,  or  man-machine  systems.  .  , 
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Figure  8-14.  Road  situation. 


slowing  down  in  lane  1;  the  signal  does  not  travel  as  fast  (by  the 
present  author’s  reasoning;  it  would  be  interesting  to  perform  tests, 
using,  of  course,  human  drivers)  to  point  B  as  it  would  if  it  were  also 
carried  by  the  cars  in  lane  2.  (This  idea  could,  of  course,  be  imple¬ 
mented  without  the  use  of  robot  chauffeurs,  using  lights  instead  of 
electric  signals.) 

Evolutionary  Systems 

An  evolutionary  system  is  a  machine  (program,  procedure,  etc.) 
that  develops  submachines  according  to  their  ability  to  perform  tasks 
(solve  problems,  recognize  patterns)  in  an  environment  produced  by  the 
real  world.  Generally,  an  evolutionary  system  is  considered  to  make  use 
of  a  “blind  generation  procedure”  and  an  “environment-oriented  selec¬ 
tion  procedure.”  A  blind  generation  procedure  is  a  method  of  creating 
new  submachines  that  is  partially  independent  of,  and  possibly  incon- 
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sistent  with,  the  environment  of  the  evolutionary  system.  Thus,  it  is 
(at  least  partially)  “random”  in  the  sense  of  Knuth  (1969b)  with  respect 
to  its  environment.  An  environment-oriented  selection  procedure  is  a 
method  by  which  the  evolutionary  program  automatically  rejects  those 
subprograms  that  its  experience  shows  to  be  incompatible  with  the 
tasks  required  by  the  environment. 

An  early  discussion  on  the  necessity  for  blind  genei  in  and 
environment-oriented  selection  procedures  is  given  by  Campbell,  who 
concluded : 


A  blind-variation-and-selective-survival  process  is  fundamental 
to  all  inductive  achievements,  to  all  genuine  increases  in  knowledge, 
to  all  increases  in  fit  of  system  to  environment. 

The  processes  which  shortcut  the  full  blind-variation-and-selec- 
tive-survival  process  are  in  themselves  inductive  achievements  con¬ 
taining  wisdom  about  the  environment  achieved  originally  by  a  blind- 
variation-and-selective-survival  process. 

In  addition,  such  substitute  processes  contain  in  their  own  opera¬ 
tion  a  blind-variation-and-selective-survival  process  at  some  level. 
(Campbell,  1960.) 

As  Nilsson  (1971)  pointed  out,  the  main  trick  is  to  design  generation 
and  selection  procedures  that  “search  at  the  highest  level  permitted  by 
the  available  information  about  the  problem  and  about  how  it  might 
be  solved.”  It  is  to  be  noted,  therefore,  that  the  subprograms  developed 
by  an  evolutionary  system  may  vary  in  the  “blindness”  they  display. 
Thus  a  really  intelligent  system  might  first  develop  programs  for  symbolic 
integration  similar  to  Slagle’s  (1963),  later  develop  programs  similar 
to  Moses’  (1967),  and  finally  (eons  later?)  develop  programs  embody¬ 
ing  the  Risch  (1969)  algorithm  (see  Chapter  3).  Also,  note  that  it  is 
possible  for  the  evolutionary  system  to  vary  its  own  “blindness,^”  to 
change  the  “level”  at  which  it  conducts  its  search  (see  Holland,  1960 
et  seq.).  Even  so,  the  present  author  agrees  with  Campbell  that  it  is 
necessary  for  such  a  system  to  preserve  some  blindness  because  (as 
stated  in  Chapter  3 )  “a  real-world  environment  has  no  known,  complete 
finite  description  or  prediction.” 

To  the  present  author’s  knowledge,  there  have  been  only  two  fully 
general  attempts  to  program  evolutionary  systems  that  would  possess 
artificial  intelligence.  These  were  the  attempts  of  Friedberg  et  al.  (1958, 
1959)  and  Fogel  et  al.  (1966).  Neither  attempt  had  any  success  com¬ 
parable  to  that  obtained  by  other,  nonevolutionary  approaches  to 
artificial  intelligence,  and  we  might  therefore  categorize  them  as  “in¬ 
structive  failures.”  Friedberg’s  program  attempted  to  develop  sub- 
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programs  written  in  machine  language  for  a  very  simple  computer.  The 
generation  procedure  produced  random  (64-instruction)  subprograms 
from  those  that  had  been  previously  produced.  The  selection  procedure 
was  used  to  assign  success-or-failure  credit  to  individual  instructions 
used  in  these  subprograms,  and  the  credit  given  to  an  instruction  was 
used  to  determine  the  likelihood  that  the  generation  procedure  would 
use  it  in  developing  future  subprograms.  Similarly,  Fogel’s  program 
was  designed  to  develop'  subprograms  corresponding  to  the  state- 
transition  diagrams  of  relatively  simple  finite-state  machines  (all  ma¬ 
chines  developed  had  less  than  30  states  and  input-output  alphabets  of 
no  more  than  8  symbols).  These  subprograms  were  used  to  make 
predictions  of  variously  chosen  sequences,  and  success-or-failure  ratings 
were  assigned  to  each  subprogram.  The  generation  process  consisted  of 
“mutating”  a  given  subprogram  to  produce  a  new  subprogram  denoting 
a  finite-state  machine  differing  from  its  “parent”  by  an  output  symbol, 
a  state  transition,  the  number  of  states,  or  the  initial  state.  “Parent” 
and  “offspring”  subprograms  would  then  have  their  predictions  compared 
for  the  same  sequences  and  the  subprogram  with  the  best  predictive 
capability  would  be  retained  (selected)  and  used  in  future  mutation 
processes  while  the  other  would  be  rejected. 

Besides  the  fact  that  these  evolutionary  systems  produced  only  very 
small  subprograms,  they  shared  an  essential  limitation  in  method; 
namely:  The  generation  and  selection  procedures  used  by  these  methods 
were  restricted  to  taking  very  small  steps  through  the  space  of  possible 
subprograms.  A  change  in  a  single  machine  instruction  or  state  of  a 
finite-state  machine  will  only  very  rarely  make  any  significant,  desirable 
change  in  the  behavior  of  the  machine  (subprogram);  the  likelihood  of 
its  doing  so  decreases  with  the  size  of  the  subprogram  being  mutated. 
We  may  expect  any  machine  with  a  general  artificial  intelligence  to  have 
a  huge  number  of  states  (say,  greater  than  10"  “®)  and  its  description 
in  an  ordinary  machine  language  to  involve  a  huge  number  of  instruc¬ 
tions  (10" .  .  .  ?).  A  further  complication  is  the  phenomenon  of  evolu¬ 
tionary  stagnation  (Bremmerman,  1962),  also  called  the  “Mesa  phe¬ 
nomenon”  (Minsky,  1963).  It  may  be  the  case  that  a  given  subprogram 
could  be  mutated  to  form  a  better  subprogram,  but  that  such  a  mutation 
would  require  several  “submutations”  of  which  any  partial  combination 
would  only  produce  a  worse  subprogram.  If  such  a  mutation  were  to 
occur,  it  would  be  necessary  for  all  of  its  submutations  to  occur  simul¬ 
taneously.  The  probability  that  this  would  happen  is  the  product  of 
the  probabilities  that  each  of  the  submutations  would  happen.  Thus, 
the  given  subprogram  may  tend  to  “stagnate”  where  it  is. 

Holland  (1960  et  seq.)  presented  a  detailed  scheme  for  the  im- 
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plementation  of  an  evolutionary  system  that  would  not  be  so  limited, 
using  “iterative  circuit  computers.”  In  particular,  he  suggested  that  the 
evolutionary  system  describe  its  submachines  hierarchically  and  that 
the  generation  (mutation)  procedure  used  be  performed  on  the  hierar¬ 
chical  descriptions  (perhaps  in  Lispish  notation)  for  the  submachines, 
not  on  the  subm^chines  themselves.  In  addition,  he  suggested  that 
the  generation  and  selection  procedures  used  by  the  evolutionary  system 
should  themselves  be  hierarchically  described  and  (recursively)  evolvef 
by  the  system.  Minsky  (1963,  pp.  434-435)  made  similar  recommenda 
tions:  “No  scheme  for  learning,  or  for  pattern  recognition,  can  have  very 
general  utility  unless  there  are  provisions  for  recursive,  or  at  least 
hierarchical,  use  of  previous  results.”  Again  citing  McCarthy  (1956): 
“The  enumeration  of  partial  recursive  functions  should  give  an  early 
place  to  compositions  of  functions  that  have  already  appeared.” 

The  implementation  of  Holland-like  evolutionary  systems  must 
await  the  satisfaction  of  the  hardware  condition  cited  at  the  beginning 
of  this  section.  (Should  these  systems  ever  be  implemented,  it  might  be 
desirable  to  “prime”  them  with  subprograms  and  generation  and  selec¬ 
tion  procedures  that  were  already  somewhat  sophisticated.)  However, 
two  types  of  evolutionary  systems  may  be  of  more  immediate  interest 
to  Ai  researchers. 

Variable-Valued  Reasoning  Programs,  Chapter  6  discussed  the 
possibility  that  reasoning-programs  might  change  their  rules  of  inference 
and  logical  calculi.  Whether  this  is  desirable  is  hard  to  say;  Certainly  it 
would  seem  that  if  it  were  done,  the  reasoning  program  should  do  it 
“reasonably.”  Still,  keeping  the  preceding  comments  in  mind,  a  certain 
amount  of  “blind”  variation  in  the  values  of  the  reasoning  program 
might  be  good.  At  any  rate,  it  would  be  desirable  that  the  reasoning 
program  not  sacrifice  the  generality  of  its  phenomena  language,  how¬ 
ever  much  it  changed  its  efficiency  at  expressing  certain  concepts.  Our 
rationale  for  giving  reasoning  programs  such  capabilities  is  the  follow¬ 
ing:  the  program  has  to  be  able  to  form  beliefs  about  its  environment 
and  recognize  errors  in  these  beliefs.  It  should  also  be  able  to  correct  the 
source  of  these  errors,  whenever  possible.  Often  the  source  will  be  an¬ 
other  belief,  but  in  some  cases  it  might  be  an  inference  rule,  therefore 
it  should  be  able  to  change  these  too.  Again,  the  ultimate  intelligent 
program  should  be  able  to  understand  that  it  is  a  program  and  under¬ 
stand  the  purposes  of  its  subprograms.  The  program  should  be  able  to 
debug,  rewrite,  and  extend  itself,  in  order  to  adapt  to  its  environment. 
And  it  should  be  able  to  perceive  itself  as  a  part  of  that  environment. 

Networks  of  Question  Answerers,  The  possibility  that  networks  of 
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question  answerers  and  protocol  analyzers  might  demonstrate  an  ability 
to  improve  their  intelligence  was  discussed  in  the  last  section  of 
Chapter  7. 

One  objection  to  the  utility  of  evolutionary  programs,  even  should 
they  be  successful  in  producing  general  artificial  intelligence,  concerns 
the  issues  of  understanding  and  control.  To  quote  McCarthy  and  Hayes: 

.  .  .  (the  evolutionary  approach)  has  had  no  substantial  success  so 
far,  perhaps  due  to  inadequate  models  of  the  world  and  of  the  evolu¬ 
tionary  process,  but  it  might  succeed.  It  seems  dangerous  since  a 
program  that  was  intelligent  in  a  way  its  designer  didn’t  Understand 
might  get  out  of  control.  (McCarthy  and  Hayes,  1968.) 

The  present  author  agrees  in  general  with  this  criticism,  but  thinks 
it  can  be  equally  well  applied  to  the  strict  reasoning-program  approach. 
It  should  be  stressed  that  for  any  intelligent,  programmed  machine 
(whether  of  the  strict-reasoning  or  evolutionary  type),  its  designer  will 
always  be  capable  of  examining  the  complete  printout  of  all  programs 
and  other  data  in  that  machine,  at  least  up  to  the  moment  the  machine 
“gets  out  of  control.”  Thus,  he  will  be  able  to  follow  the  development 
of  any  subprograms  in  an  evolutionary  machine,  or  the  proofs  of  any 
theorems  by  a  reasoning  machine,  to  whatever  extent  he  desires.  In 
either  case  the  amount  of  data  involved  might  be  enormous,  so  he 
might  need  to  make  use  of  other  machines  to  examine  it  (“reasoning 
checkers”;  see  Chapter  9).  It  does  not  seem  that  the  “reasoning  pat¬ 
terns”  of  reasoning  programs  will  necesarily  be  more  perspicuous  than 
the  “evolutionary  patterns”  of  evolutionary  programs.  And,  if  either 
type  of  machine  is  allowed  to  interact  with  a  real-world  environment, 
the  designer  will  not  be  able  to  control  precisely  the  information  that 
will  come  into  it.  Thus,  we  do  not  expect  that  he  will  be  able  to  control 
completely  the  actions  of  either  type  of  machine.  The  only  question  is 
whether  the  designer  will  be  able  to  foresee  that  his  intelligent  machine  is 
going  out  of  control,  before  it  actually  does  so,  und  there  does  not  seem 
to  be  a  guarantee  that  he  can  have  such  an  ability,  for  either  machine. 
The  best  he  can  do  is  to  predesign  the  machine  so  that  its  “freedom  of 
will”  (which  is  basically  “freedom  of  actioii”)  can  never  exceed  cer¬ 
tain  bounds.  We  expect  that  this  can  be  done,  within  limits,  for  either 
the  reasoning-machine  or  the  evolutionary  machine. 
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SUMMARY 

In  this  chapter  we  have  seen  that  the  abilities  of  machines  to  do 
things  people  would  normally  say  require  intelligence  is  complemented 
by  abilities  to  do  things  people  would  normally  say  require  life,” 
namely:  self-reproduce,  evolve,  self-organize,  self-diagnose,  and  self- 
repair.  These  abilities  may  in  the  future  be  highly  relevant  to  the  search 
for  artificial  intelligence  and  to  the  development  of  future  industries  and 
technologies. 


NOTES 

8-1.  A  graph  G  is  an  ordered  pair  {N,R),  where  N  is  a  set  of  nodes  and 
R  is  a  binary  relation  on  N.  If  /?(A:,y)  holds  for  a  given  x  and  y  in  N,  then 
we  say  that  is  connected  to  y/Vor  ''x  connects  to  y/'  under  the  relation 
R.  If  for  all  x  and  y  in  N,  R(x,y)  implies  i?(y,x),  then  we  say  the  graph 
is  bidirectional  or  bidirected  (see  Chapter  3).  . 

8-2.  A  group  G  is  an  ordered  pair  (E,*),  where  E  is  a  set  of  elements 
(or  nodes,  etc.)  and  is  the  group  operation^  a  function  on  E  X  E  to  E 
which  is  such  that  (a)  is  associative,  i.e.,  jc*(y»z)  =  (^•y)*z  for  all  jc,y, 
and  z  in  E;  (b)  there  exists  an  identity  element  e  in  E  which  is  such  that, 
for  any  given  x  in  E,  x*e  =  e*x  =  x;  (c)  for  each  x  in  E  there  is  an  inverse 
element,  denoted  x~^,  in  E  such  that  jc  •  —  e.  If  ^  is  a  subset  of  E, 

then  by  A  *  we  denote  the  set  of  all  elements  x  such  that 

^==yi*(y2*(  *  *  *  (yn-i*yn)  •  •  *)) 

for  some  finite  n,  and  for  some  choice  of  the  yi  such  that  each  yi  is  an 
element  of  A.  (Cf.  the  equivalent  notion  of  sets  of  strings,  given  in  Chapter 
6.)  For  any  group,  E*  —  E.  (Why?) 

A  group  is  said  to  be  finitely  generated  if  there  is  a  finite  set  G\  called 
a  generator  set  for  G,  such  that  (G*’)  ^  —  E.  Finally,  a  group  G  is  said  to  be 
Abelian,  or  commutative,  iff  x*y  —  yx  /or  all  x,y  in  E. 

A  given  group  may  possess  more  than  one  generator  set;  for  example, 
the  Abelian  group  corresponding  to  the  two-dimensional  Cartesian  (square) 
grid  is  generated  by  the  set 

{(1,0),  (0,1),  (-1,0),  (0,-1)} 

and  by  the  set  . 

{(1,  1),  (0,-1),  (-1,0)} 
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as  may  be  seen  by  comparing  the  following  two  diagrams: 
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(-1,1)  (0,1)  (1,1) 

- • - ►  • 

(-1,0)»  -  • - ►•(1,0) 


(-1,-1)* 


(0,-1) 


(1,-1) 


(-1,1) 

•  -« — 

(-1,0)  • 
(-1,-1)* 


(0,1)  (1,1) 


(1*1) 


(The  group  operation  is  vector  addition.) 

8—3.  Most  work  has  been  done  with  cellular  automata  whose  spaces  are 
described  by  Abelian  groups;  the  main  results  have  all  been  extended  to  the 
(slightly)  more  general  case  of  spaces  equivalent  to  “homogenous  tessela- 
tions”  (e.g.,  A.  R.  Smith,  1969).  Not  much  work  has  been  devoted  to 
spaces  described  by  nonhomogenous  tesselations,  and  very  little  has  been 
done  on  spaces  equivalent  to  graphs  in  general,  although  certain  results  have 
been  obtained;  for  example,  any  finite  cellular  automaton  (i.e.,  containing  a 
finite  number  of  cells)  is  describable  by  a  single  finite-state  automaton. 

8-4.  The  space  itself  of  the  process  can  properly  be  said  to  be  “self- 
affecting.”  This  raises  some  intriguing  possibilities.  For  example,  here  is  a 
self-reproducing  space:  Let  the  process  make  use  of  a  countably  infinite  set 
of  cells,  each  cell  referred  to  by  a  unique  pair  of  integers  {/,/)  (positive  or 
negative),  every  such  pair  being  used  to  refer  to  a  cell;  let  the  structure,  or 
space  of  the  process  always  be  described  by  means  of  four  relations  (L,  A, 
B,  R)  between  certain  ceils;  let  the  initial  structure  ao  at  time  to  be  such  that 

r(x,y)  iff  x^gi  =  y 
for  some  gi  in  the  set 

G^=  ((1,0),  (0,1),  (-1,0),  (0,-1)} 
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where  (•)  signifies  the  operation  of  vector  addition;  in  particular  let 

y  =  x^{l,0)^R{x,y) 
y  —  x*{0,\)^A{x,y) 
y  =  x^{-\fi)^L{x,y) 
y  =  x^{0-l)=^B{x.y) 

Thus,  the  initial  space  structure  of  the  process  is  the  two-dimensional  Car¬ 
tesian  grid.  Finally,  let  each  cell  be  described  by  the  same  “structure¬ 
processing”  machine  M  which,  given  the  observed  neighborhood-structure 


O 


O 


replaces  it  by  the  structure 


O 


The  figure  below  then  shows  the  initial  structure  space  o-o  of  the 
process  at  time  4  and  the  subsequent  structure  space  <tx  produced  at  time  ti, 
all  cells  having  operated  simultaneously  during  the  unit  time-interval 
their  actions  being  superposed  in  a  logical  manner.  However,  o-i  is  equivalent 
to  two  Cartesian  grids,  0-2  will  be  equivalent  to  four,  and  so  on,  as  the  reader 
can  easily  prove.  The  space  structure  of  this  process  is  therefore  self-re¬ 
producing. 
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8-5.  How  to  prove  this  fact  for  an  ams  is  briefly  sketched,  and  the  reader 
can  fill  in  the  details.  The  basic  givens  are  as  follows:  There  may  be  an 
infinite  number  of  cells  in  an  ams,  but  each  cell  must  be  described  by  one 
of  a  finite  number  of  (polychephalic)  Turing  machines;  each  cell  has  a  finite 
number  of  tapeheads,  whose  initial  locations  are  effectively  computable; 
each  square  initially  has  a  finite  set  of  tapeheads  scanning  it.  It  is  also  re¬ 
quired  that  this  set  be  computable,  both  as  to  the  number  and  as  to  the 
nature  (to  which  cell(s)  they  are  attached)  of  the  tapeheads  scanning  any 
given  square. 

An  inductive  argument  is  necessary.  These  conditions  imply  that  one 
can  compute  the  initial  configuration  of  tapeheads  and  symbols  in  any  finite 
set  of  squares  of  the  space.  We  will  assume  that  one  can  compute  the  con¬ 
figuration  of  tapeheads  and  symbols  in  any  finite  set  of  squares  of  the  space 
after  a  finite  elapsed  time  t  ~  k  and  describe  how  to  show  that  it  can  con¬ 
sequently  be  done  for  /  =  A:  +  1 . 

Let  Z)  be  a  finite  set  of  squares  in  the  space  of  the  ams;  let  a,b,c,  ...  be 
variables  that  range  over  the  squares  of  the  ams,  and  let  jc,y,z,  ...  be  vari¬ 
ables  that  range  over  its  cells.  As  a  (descriptive,  but  essentially  valid)  nota¬ 
tion,  let  us  then  use 

£)*  =  assignment  of  symbols  to  squares  of  D  at  time  /  =  k 
r*  =  [x/x  scans  a  square  of  (at  time  t  =  A)} 

The  hypothesis  should  be  that  both  and  T’"  are  computable  for  all  t  <  k. 
Also,  let 
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=  assignment  of  symbols  to  squares  of 
DVJ{a/p{a,D)<j} 
at  time  t  =  k  —  j 

where  p  (a,D)  is  a  metric  giving  the  distance  (shortest  number  of  moves 
required  for  a  tapehead  to  go)  from  square  a  to  a  square  of  D.  In  general, 
let 

~  {x/x  scans  (at  time  t  =  k  —})] 

and 

=  assignment  of  symbols  to 
{a/x  scans  a  for  some  x  in 
at  time  t  =  k  —  j. 

=.  [x/x  scans  (at  time  k  ~  j)  ] 

etc.  The  notation  can  be  easily  extended  for  use  in  a  complete  proof. 

To  make  the  proof  by  induction,  the  major  thing  to  note  is  that 
and  depend  (in  a  computable  way)  on  ^  and  that 

our  induction  hypothesis  guarantees  that  each  of  these  things  can  be  com¬ 
puted  (they  are  all  finite,  if  D  is  finite,  and  they  are  all  defined  for  time 
-k). 

8—6.  The  permission  of  zero  delay  in  a  machine  is  perhaps  unrealistic; 
automata  theorists  have  also  investigated  the  “middle  ground”  where  zero 
delay  is  allowed  but  is  limited  in  the  amount  of  space  it  can  coyer.  Thus,  we 
might  require  (see  Wagner,  1964,  1965)  that  all  cells  of  the  ams  be  spider 
automata  (i.e.,  poly  cephalic  Tm’s)  whose  tapeheads  can  never  be  greater 
than  a  fixed  distance  from  each  other.  Similarly,  the  Holland  (1960) 
iterative  circuit  computers  are  restricted  to  having  no  more  than  a  fixed 
number  of  Mealy  automata  in  any  given  chain  of  zero-delay  gates.  These 
restrictions  do  not,  however,  destroy  composition  universality  in  the  ams 
(see  Holland,  1970,  pp.  341-343). 

8-7.  From  the  philosopher’s  standpoint,  the  concept  of  self  can  be  under¬ 
stood  in  at  least  two  senses:  First,  self  as  the  essence  of  consciousness;  sec¬ 
ond,  self  as  the  image  that  consciousness  has  of  self.  Machines  such  as  those 
we  describe  as  “self-affecting”  are  composed  of  submachines  that  each 
operate  with  respect  to  an  “image”  of  the  other  machines.  The  simultaneous 
operation  of  all  machines  is  capable  of  changing  the  image  that  an  outside 
observer  might  have  of  their  (momentary)  totality — thus  the  name  “self- 
affecting.”  The  true  “self”  of  the  machine  (if  there  is  one)  presumably  does 
not  change. 

8-8.  It  should  be  noted  that  there  is  a  difference  in  the  definition  of  “con¬ 
struction  universality”  (as  used  by  von  Neumann)  and  “composition  uni¬ 
versality”  as  used  by  Holland.  A  universal  constructor  in  an  abc  is  defined 
to  be  able  to  produce  any  finite  configuration  of  elements  from  a  certain 
finite  set  of  finite  configurations  of  finite-state  machines.  These  elements  are 
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the  “parts”  both  of  the  universal  constructor  itself  and  of  all  the  machines 
it  can  build,  A  universal  constructor  is  not  defined  as  being  able  to  construct 
or  simulate  the  construction  of  any  finite  configuration  of  elements  from  the 
set  of  all  finite  configurations  of  finite-state  machines:  As  the  present  author 
understands  it,  this  is  Holland’s  concept  of  composition  universality. 


EXERCISES 

5-i.  Prove  that  an  “infinite  conflict-of -print-commands”  problem  cannot  arise 
in  an  a,ms,  given  that  each  square  of  space  initially  has  only  a  finite  number  of 
tapeheads  scanning  it. 

5-2.  Let  A  and  B  be  two  machines,  each  engaged  in  performing  some  never- 
ending  task,  with  the  additional  feature  that  A  is  able  to  scan  B,  recognize^  when¬ 
ever  B  is  not  performing  correctly,  stop  B,  repair  B,  and  then  start  B  again,  and 
that  B  is  able  to  do  the  same  for  A.  Assume  that  ^  and  B  operate  simultaneously 
during  discrete  time  intervals,  and  that  each  machine  is  able  to  detect  and  repair 
a  malfunction  in  the  other  machine  during  the  single  time  interval  In  which  it 
occurs,  unless  the  repairing  machine  breaks  down  itself  during  that  time  interval, 
in  which  case  neither  will  be  repaired.  If  either  machine  is  working  correctly  at 
time  t,  then  the  probability  is  p  that  it  will  break  down  during  the  interval 
until  time  t +1;  if  neither  machine  is  working  correctly  at  time  t,  then  it  is 
certain  that  they  will  not  be  repaired  at  time  t  +  1.  (a)  What  is  the  probability 
that  both  machines  will  break  down  during  the  same  time  interval?  (b)  What  is 
the  mathematically  expected  number  of  time  intervals  that  one  machine  would 
survive  alone?  (c)  What  is  the  mathematically  expected  number  of  time  intervals 
that  the  two  machines  will  survive  together? 

8—3,  Define  “nondeterministic  cellular  automata.”  Show  that  Checkers,  Chess, 
and  GO  can  be  represented  by  nondeterministic  abcs. 

8-4.  Design  some  simple  self-replicating  machines. 
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INTRODUCTION 

This  chapter  summarizes  the  preceding  chapters  with  a  brief  re¬ 
view  of  current  research  on  robots  and  a  look  at  the  possible  future 
uses  of  artificial  intelligence. 


ROBOTS 

A  robot  is  a  mechanical  intelligence  capable  of  operating  in  our 
own  real-world  environment  (see  Chapter  3).  The  successful  construc¬ 
tion  of  a  robot  entails  the  integration  of  most,  if  not  all,  of  the  tech¬ 
niques  discussed  in  previous  chapters;  in  addition,  it  requires  that  a 
host  of  new  problems  be  solved.  Thus,  a  robot  must  have  some  sort  of 
sensing  and  perception  system  that  allows  it  to  detect  pattern  examples 
in  the  environment;  it  must  have  some  way  of  reasoning  about  its 
environment  (and  its  relations  to  that  environment);  and  it  must  also 
have  some  way  of  acting  upon  its  environment.  In  our  particular  real- 
world  environment  a  robot  must  confront  the  fact  that  no  description  of 
its  current  situation  (the  current  state  of  the  environment)  can  possibly 
be  complete,  in  the  sense  of  removing  all  uncertainties  about  the  situa- 

387 


388 


INTRODUCTION  TO  ARTIFICIAL  INTELLIGENCE 


tion  or  enabling  the  robot  to  answer  all  questions  about  it.  There  seem 
to  be  three  basic  reasons  for  this  (Munson,  1971) : 

I.  Sensing  and  perception  systems  are  subject  to  accuracy 
limitations  (e.g.,  Heisenberg’s  Uncertainty  Principle)  and 
also  to  gross  failures  (e.g.,  misinterpretation  of  lines). 

II.  Many  real-world  objects  may  not  be  completely  described, 
simply  because  of  their  complexity  (e.g.,  a  human  or  a 
complex  piece  of  equipment). 

III.  Any  actions  taken  by  the  robot  in  its  environment  are 
subject  to  inaccuracies,  or  failures,  and  may  introduce 
uncertainties  rather  than  remove  them. 

As  noted  in  previous  chapters,  ai  researchers  have  been  so  far 
only  partially  successful  in  giving  computers  sensing,  perceiving,  and 
reasoning  abilities.  It  will  therefore  come  as  no  surprise  to  the  reader 
that  they  have  also  been  only  partially  successful  in  enabling  computers 
to  perform  actions  in  the  real  world  and  in  integrating  these  abilities 
to  make  the  complete  robot.  Still,  there  has  been  some  success. 

Research  on  robots  is  currently  being  undertaken  in  the  United 
States,  Great  Britain,  and  Japan.’^  Citations  to  this  research  will  be 
found  in  Aida  et  al.  (1971),  J.  D.  Becker  (1969),  Coles  (1969, 
1970a,b),  Doran  (1969,  1970),  Ejiri  et  al.  (1971),  H.  A.  Ernst 
(1962),  J.  Feldman  (1967),  Fikes  (1971),  Friedman  (1969),  Hart 
etal.  (1971),  Hayes  (1971),  Hewitt  (1971a),  Kinoshita  et  al  (1971), 
McCarthy  (1964a,  1968),  Munson  (1970a,b;  1971),  Nilsson  (1969, 
1970),  Paul  (1971),  Pingle  et  al.  (1968),  Popplestone  (1969), 
Raphael  et  al.  (1971),  C.  A.  Rosen  (1970),  and  Sutro  and  Kilmer 
(1969).  The  robot  sensing  and  perception  systems  that  have  been  in¬ 
vestigated  have  been  primarily  vision  and  touch,  with  some  attention  to 
hearing;  see  Astrahan  (1970),  Bobrow  and  Klatt  (1968),  Coles 
(1969),  Raj  Reddy  (1966).  Robot-reasoning  procedures  that  have 
been  used  include  heuristic  tree  search  and  theorem  proving  (both 
resolution-based  and  PLANNER-based;  also  see  Hart,  Nilsson,  and  Robin¬ 
son,  1971).  The  only  robot  effector  systems  (for  acting  on  the  environ¬ 
ment)  that  have  yet  been  developed  are  mechanical  hands  (and  arms) 
and  locomotion  systems.  As  yet  there  is'  no  robot  that  successfully 
combines  all  these  systems  in  its  performance. 

The  basic  nature  and  the  current  limitations  of  robot  vision  sys¬ 
tems  are  described  in  Chapter  5  of  this  book.  Robots  are  stiU  “effec¬ 
tively  blind,”  at  least  when  compared  to  humans.  In  addition  to  requir- 
ing  special  lighting  conditions,  robots  are  unable  to  “see”  moving 

^  Undoubtedly,  the  Soviet  Union  also  conducts  robot  research. 
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objects,  even  when  they  are  moving  quite  slowly,^  and  they  have  only 
very  limited  ability  to  make  use  of  color  and  texture. 

Tactile  sensing  and  perception  systems  are  still  at  a  rudimentary 
stage  of  development;  however,  it  has  been  possible  (in  a  way,  some¬ 
what  natural)  to  integrate  these  systems  into  the  operation  of  effector 
systems.  Kinoshita,  Aida,  and  Mori  (1971)  described  a  computer- 
controlled  mechanical  hand  that  has  a  very  simple  tactile  sensing  and 
perception  system.  Figure  9—1  shows  the  Kinoshita  hand  as  well  as 
the  (more  versatile,  but  less  sensitive)  computer-controlled  arm  and 
hand  in  use  at  the  Stanford  Artificial  Intelligence  Project  (Paul,  1971). 
The  bump-detector  “whiskers”  of  the  robot  Shakey  (Fig,  9-2)  are  inte¬ 
grated  into  its  use  of  locomotion  (Coles,  1970a).  Finally,  Aida,  Cor¬ 
delia,  and  Ivacevic  (1971)  present  an  approach  to  the  integration  of 
visual  and  tactile  perception  systems.^ 

To  date,  the  real-world  problems  that  robots  have  been  able  to 
solve  have  been  of  the  “toy  problem”  variety,  at  the  level  of  difficulty 
of  the  Monkey-and-B ananas  Problem  discussed  in  Chapter  6.  Ejiri  et 
al.  (1971)  presented  a  robot  with  the  ability  to  solve  problems  that 
involve  stacking  blocks  into  simple  configurations.  Coles  (1970a)  de¬ 
scribed  how  Shakey  makes  use  of  resolution-based  theorem  proving 
to  solve  a  problem  that  involves  deciding  to  use  a  ramp  as  a  tool  to 
climb  up  on  a  platform  and  push  a  box  off  the  platform  (see  Fig.  9-3). 
More  recently,  the  Shakey-STRiPs  configuration  has  been  used  to  solve 
more  “difficult”  problems  (see  Chapter  6).  Finally,  Feldman  et  al. 
(1971)  described  how  a  computer  can,  with  the  use  of  an  eye  system 
and  a  mechanical  hand  (“known  affebtionately  as  Butterfingers”), 
solve  the  Instant  Insanity  puzzle  by  physically  stacking  up  the  blocks 
in  a  desired  configuration.^ 

Although  these  tasks  are  relatively  irivial,  it  is  to  be  expected  that 
robot  research  will  make  significant  progress  in  the  next  few  years, 
following  the  implementation  of  the  PLANNER-like  programming  lan¬ 
guages  (see  Chapters  6  and  7). 


2  However,  when  the  type  of  motion  is  restricted  and  predefined,  robots  can 
be  quite  successful.  Thus,  Hieserman  (1971)  describes  an  opticahelectronic 
scanning  system  that  accurately  records  the  type,  owner,  and  registration  number 
of  every  freight  car  in  a  train  moving  at  speeds  up  to  80  miles  per  hour. 

®  The  magazine  Electronic  Design  (Nov.  11,  1971,  p.  30)  reported  the  de¬ 
velopment  of  a  highly  sophisticated  mechanical  arm  and  hand,  designed  to  be 
remote-controlled  by  a  human  being.  The  device,  called  the  Naval  Anthropo¬ 
morphic  Teleoperator  (nat)  has  a  tactile  sensing  system  that  supplies  force- 
feedback  to  the  human  operator,  and  is  so  sensitive  that  it  can  be  used  to  thread 
needles. 

^The  problem-solving  program  was  designed  specifically  for  the  Instant 
Insanity  Puzzle,  and  was  not  a  “general  problem  solver”  (see  Chapter  3). 
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Figure  9-1.  (a)  Kinoshita’s  versatile  computer-controlled  hand;  (b)  the 
sensitive  computer-controlled  arm  and  hand  used  at  Stanford  Al 
Project.  (Kinoshita,  Aida,  Mori,  1971,  Paul,  1971,  reprinted  with  per¬ 
mission;) 


The  harvest  of  artificial  intelligence 


391 


Figure  9-2.  Shakey.  (Coles,  1970,  reprinted  with  permission.) 
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A  LOOK  AT  POSSIBILITIES 
Tools  and  People 

The  utility  of  artificial  intelligence  research  forms  an  important 
part  of  a  larger  question,  that  of  the  relation  of  technology  to  society 
and  nature.  The  urgency  of  this  larger  question  is  readily  demonstrated 
in  scientific  terms,  for  technology  is  inherently  concerned  with  the  de¬ 
velopment  of  tools  that  affect  our  environment.  Yet  scientists  and  lay¬ 
men  have  only  recently  begun  to  consider  this  question  in  detail. 

Technology  is,  simply  speaking,  both  the  tools  and  the  ability  to 
make  tools,  which  people  have  developed.  Technology  has  existed  since 
people  first  used  shovels  and  spears,  but  it  changes  its  form  with  each 
new  invention  and  discovery.  Moreover,  a  change  in  technology  may 
either  increase  our  abilities  or  limit  them.  Technology  has  made  it  pos¬ 
sible  for  a  man  to  breathe  under  water  if  he  is  wearing  scuba  tanks,  and 
difficult  for  him  to  breathe  in  a  large  city  if  he  is  not.  Changes  in  tech- 
nology  may  help  bring  about  changes  in  society.  Examples  are  the  in¬ 
vention  and  introduction  to  civilization  of  cannon  warfare,  the  printing 
press,  the  telegraph,  the  assembly-line  factory,  the  automobile,  and  the 
airplane.  Two  central  facts  about  technology  are  that  it  gives  people 
tools  that  enable  them  to  do  things  they  could  not  have  done  previously, 
and  that  the  provision  and  use  of  these  tools  may  cause  either  beneficial 
or  detrimental  consequences. 

Artificial  intelligence  and  the  development  of  computer  science 
in  general  represent  a  change  in  technology  of  the  first  magnitude,  com¬ 
parable  to  that  of  the  discovery  and  development  of  atomic  power 
sources.  This  change  in  technology  has  given  us  the  ability  to  make 
tools  (in  fact,  industries)  that  can  be  self-directing,  that  can  operate 
more  and  more  freely  of  our  control.  We  have  seen  that  ai  researchers 
can  give  machines  the  ability  to  perform  many  tasks  that  previously 
could  be  performed  only  by  people.  In  particular,  such  machines  can 
solve  problems,  “reason”  about  actions  and  their  consequences,  display 
“learning”  abilities,  perceive  patterns,  control  mechanical  hands,  repair 
themselves,  and  converse  with  us  in  our  own  languages.  It  is  important 
for  us  to  ask  what  consequences  artificial  intelligence  will  have. 

Although  it  is  too  early  to  say  with  certainty  what  changes  in 
environment  and  society  will  result  from  ai  research,  we  can  point  to 
some  possibilities.^  If  the  research  is  unsuccessful  at  producing  a  general 

®  For  the  reader  who  is  interested  in  comparing  estimates,  we  cite  the  refer¬ 
ences  in  the  Bibliography  to  Armer,  Asbell,  Asimov,  Barth,  Berkeley,  Borko, 
Burck,  Burke,  Clarke,  Demczynski,  Diebold,  Eastland  et  al.,  Eber,  Foreign 
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artificial  intelligence,  over  a  period  of  more  than  a  hundred  years,  then 
its  failure  may  raise  some  serious  doubt  among  many  scientists  as  to  the 
finite  describability  of  man  and  his  universe.  However,  the  evidence 
presented  in  this  book  makes  it  seem  likely  that  artificial  intelligence 
research  will  be  successful,  that  a  technology  will  be  developed  which  is 
capable  of  producing  machines  that  can  demonstrate  most,  if  not  all, 
of  the  mental  abilities  of  human  beings.  Let  us  therefore  assume  that 
this  will  happen,  and  imagine  two  worlds  that  might  result. 


Overmechanization  of  the  World: 

The  Machine  as  Dictator 

In  the  author’s  research  for  this  book  and  his  conversations  with 
people  over  the  past  several  years,  one  of  the  dominant  viewpoints  he 
has  encountered  with  respect  to  artificial  intelligence  research  is  the 
fear  that  it  will  produce  a  world  with  “everyone  living  in  a  machine.” 
This  is  the  viewpoint  that  Asimov  calls  the  “Frankenstein  complex.  It 
is  a  particularly  disturbing  viewpoint  because  it  is  highly  plausible.  We 
should  not  pretend  that  ai  techniques  can  be  (or  are  being)  used  only 
for  peaceful,  desirable  purposes.  Rather,  we  should  note  that  artificial 
intelligence  can  also  be  used  to  simulate  those  aspects  of  human  intel¬ 
ligence”  which  are  warlike  or  otherwise  undesirable,®  and  that  such 
mechanical  intelligences  could  be  given  perception  and  effector  systems 
that  would  allow  their  “simulation”  to  take  place  in  real  time,  in  our 
own  evironment.  It  is  not  difficult  to  envision  actualities  in  which  an 
aritificial  intelligence  would  exert  control  over  human  beings,  yet  be 
out  of  their  control. 

Given  that  intelligent  machines  are  to  be  used,  the  question  of 
their  control  and  noncontrol  must  be  answered.  If  a  naachine  is  pro¬ 
grammed  to  seek  certain  ends,  how  are  we  to  insure  that  the  means  it 

Policy  Assoc.,  Forrester,  Fromm,  Galbraith,  Gordon,  Greenberger,  Hamming, 
Hatt,  Hilton,  Johannesson,  Kochen,  Landers,  Licklider,  J.  Martin  and  Norman, 
A.  R.  Miller,  Mumford,  McCarthy,  Newell,  Parsons  and  Williams,  Philipson, 
Pierce,  Polak,  Pylyshyn,  Reichardt,  Rezler,  Sackman,  Samuel,  Silberman,  Simon, 
Skinner  et  al.,  Slagle,  Sprague,  Taviss,  Toffler,  Toynbee,  Vonnegut,  Westin,  and 
Wiener. 

6  For  example,  Colby,  Weber,  and  Hilf  (1971)  developed  a  computer  pro¬ 
gram  (known  as  parry)  that  possesses  “artificial  paranoia.”  It  should  be  under¬ 
stood  that  no  criticism  of  Colby  and  his  coworkers  is  intended  (their  research 
was  directed  toward  a  better  understanding  of  human  paranoia);  they  have  per¬ 
formed  a  valuable  service  by  showing  that  computers  can  behave  in  a  way  that 
many  people  would  generally  think  impossible,  “2001”  and  hal  aside.  Our  ques¬ 
tion  now  is  whether  computers  could  behave  this  way  even  if  we  did  not  want 
them  to. 
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chooses  to  employ  are  agreeable  to  people?  A  preliminary  solution  to  the 
problem  is  given  by  the  fact  that  we  can  specify  state-space  problems 
to  require  that  their  solution  paths  shall  not  pass  through  certain  states 
(see  Chapter  3).  However,  the  task  of  giving  machines  more  sophisti¬ 
cated  value  systems,  and  especially  of  making  them  “ethical,”  has  not 
yet  been  deeply  investigated  by  ai  researchers;  probably  for  the  next 
few  years  it  will  not  be  necessary.  Asimov  (1950)  presented  a  sugges¬ 
tion  of  three  rules  that  might  be  included  in  an  ethical  robot,  and  which 
have  become  known  as  Asimov’s  Three  Laws  of  Robotics: 

1.  A  robot  may  not  injure  a  human  being  or,  through  inaction, 
allow  a  human  being  to  come  to  harm. 

2.  A  robot  must  obey  the  orders  given  it  by  human  beings 
except  where  such  orders  would  conflict  with  the  first  law. 

3.  A  robot  must  protect  its  own  existence  as  long  as  such  pro¬ 
tection  does  not  conflict  with  the  first  or  second  law. 

Although  these  rules  have  not  yet  been  implemented,  there  seems  to  be 
no  a  priori  reason  why  they  could  not  be. 

The  question  of  control  should  be  coupled  with  the  “lack  of  under¬ 
standing”  question;  that  is,  the  possibility  exists  that  intelligent  machines 
might  be  too  complicated  for  us  to  understand  in  situations  that  require 
real-time  analyses  (see  the  discussion  of  evolutionary  programs  in 
Chapter  8).  We  could  conceivably  always  demand  that  a  machine  give 
a  complete  output  of  its  reasoning  on  a  problem;  nevertheless  that 
reasoning  might  not  be  effectively  understandable  to  us  if  the  problem 
itself  were  to  determine  a  time  limit  for  producing  a  solution.  In  such  a 
case,  if  we  were  to  act  rationally,  we  might  have  to  follow  the  machine’s 
advice  without  understanding  its  “motives.”  A  good  solution  would  be 
to  use  other  machines  (“reasoning  checkers”)  to  corroborate  the  rea¬ 
soning  and  advice  of  the  first  machine;  in  such  a  case  it  would  be  essential 
to  understand  any  “side  effects”  that  the  first  machine’s  reasoning  might 
have  on  the  reasoning  checker. 

It  has  been  suggested  that  an  intelligent  machine  might  arise  ac¬ 
cidentally,  without  our  knowledge,  through  some  fortuitous  intercon¬ 
nection  of  smaller  machines  (see  Heinlein,  1966).  If  the  smaller 
machines  each  helped  to  control  some  aspect  of  our  economy  or  defense, 
the  accidental  intelligence  might  well  act  as  a  dictator.  (A  similar  theme 
has  been  stated  in  Reich,  1971,  and  Eiseley,  1970,  and  by  many  others, 
who  have  suggested  that  humanity  and  its  machines  are  evolving  into  an 
autonomous  meta-organism.)  It  seems  highly  unlikely  that  this  will 
happen,  especially  if  we  devote  sufficient  time  to  studying  the  non- 
accidental  systems  we  iniplement. 
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A  more  significant  danger  is  that  artificial  intelligence  might  be 
used  to  further  the  interests  of  human  dictators.  A  limited  supply  of 
intelligent  machines  in  the  hands  of  a  human  dictator  might  greatly 
increase  his  power  over  other  human  beings,  perhaps  to  the  extent  of 
giving  him  complete  censorship  and  supervision  of  the  public.  The 
vision  of  Big  Brother  in  Orwell’s  1984  could  become  a  mechanical 
reality.  Recent  work  on  an  “electronic  battlefield”  and  the  development 
of  computerized  data  files  containing  individual  information  about  many 
American  citizens  (see  A.  R.  Miller,  1971;  Sprague,  1969)  illustrate 
this  possibility.  In  this  regard,  it  should  be  noted  that  most  ai  research 
in  the  United  States  is  funded  by  the  Department  of  Defense  (specifi¬ 
cally,  by  the  Advanced  Research  Projects  Agency,  or  arpa).  Indeed, 
in  this  country  most  scientific  research  of  any  sort  is  funded  by  the 
Department  of  Defense.  The  ai  research  supported  by  the  arpa  Office 
of  Information  Processing  Techniques  is  unclassified,  available  to  the 
public,  and  not  intended  for  any  Big  Brothers.  Still,  science  should  be 
supported  in  its  own  light  or  else  our  real  problems  may  never  be  solved, 
namely — we  must  replace  our  inefficient  technology  before  the  finite  re¬ 
sources  of  our  planet  are  wasted. 

Finally,  industrial  automation  is  currently  foreseen  as  one  of  the 
chief  applications  of  artificial  intelligence.  It  is  possible  that  a  factory 
could  be  completely  operated  by  machines.  Indeed,  it  is  possible  that 
most  of  the  physical  and  paper-work  drudgery  necessary  to  sustain  an 
entire  economy  could  be  performed  by  intelligent  machines.  If  this 
were  to  happen  too  quickly,  some  people  might  not  be  able  to  adjust  to 
their  increased  leisure  time  (see  Vonnegut,  1952.)-. 

The  Well-Natured  Machine 

Let  us  now  paint  another,  more  positive  picture  of  the  world  that 
might  result  from  artificial  intelligence  research.  Hints  about  this  world 
have  appeared  throughout  this  book,  and  it  is  time  to  look  more  clearly 
at  some  of  its  qualities.  It  is  a  world  in  which  man  and  his  machines 
have  reached  a  state  of  symbiosis,  hoih.  with  each  other  and  with  the 
rest  of  the  environment  taken  as  a  whole  (see  Landers,  1966).  “Sym¬ 
biosis”  is  a  word  from  biology,  referring  to  a  relationship  between  two 
or  more  species,  which  is  mutually  beneficial  to  the  members  of  each. 
It  is  an  apt  word  because  automation  and  artificial  intelligence  research 
are  concerned  with  the  development  of  a  “species  of  machines”  that 
will  simulate  at  least  two  of  the  major  abilities  of  living  organisms: 
self-repair  and  intelligence  (and  perhaps  self-reproduction). 

The  benefits  humanity  might  gain  from  achieving  such  a  symbiosis 
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are  enormous.  As  mentioned  above,  it  may  be  possible  for  artificial 
intelligence  to  greatly  reduce  the  amount  of  human  labor  necessary  to 
operate  the  economy  of  the  world  (Simon,  1962;  McCarthy,  1971). 
Computers  and  ai  research  may  play  an  important  part  in  helping  to 
overcome  the  food,  population,  housing,  and  other  crises  that  cur¬ 
rently  grip  the  earth,  and  in  helping  us  to  understand  the  dynamics  of 
the  relevant  “world  system”  (Forrester,  1971).  They  may  also  play  a 
significant  role  in  the  planning  of  technological  systems,  the  decentraliza- 
tion  of  cities,  and  in  the  discovery  and  prevention  of  ecological  disasters 
(see  Graves,  Pingry,  and  Whinston,  1971;  Langlois,  1971;  Olsen, 
1971;  Ross,  1971).  Artificial  intelligence  may  eventually  be  used  to 
contain  the  “information  explosion”  (see  Kochen,  1967;  Licklider, 
1965)  and,  perhaps,  to  partially  automate  the  development  of  science 
itself  (Buchanan,  Fiegenbaum,  and  Lederberg,  1971;  Hearn,  1971; 
Paton,  1970).  As  Toffler  (1970)  suggested,  a  sophisticated,  computer- 
controlled  economic  system  could  give  us  a  greater  diversity  of  choices 
for  housing  and  other  goods  than  we  have  ever  had  in  the  past.  The 
future  may  well  see  the  development  of  an  “information  utility” 
(Sprague,  1969;  Armer,  1968)  that  will  enable  each  individual  of  the 
general  public  to  have  a  computer  terminal  in  his  home  that  will  give 
him  access  to  the  current  “public  information”  of  the  world,  as  well  as 
to  the  abilities  of  a  general  problem-solving  artificial  intelligence,  for  a 
price  roughly  comparable  to  that  he  currently  pays  for  electricity  or 
water.  Perhaps  artificial  intelligence  will  someday  be  used  in  automatic 
teachers  that  will  be  as  good  at  teaching  as  people  are,  and  perhaps 
mechanical  translators  will  someday  be  developed  which  will  fluently 
translate  human  languages.  And  {very  perhaps)  the  day  may  eventually 
come  when  the  “household  robot”  and  the  “robot  chauffeur”  will  be  a 
reality  (see  Schmidt,  1971). 

There  is  no  space  to  allow  discussion  of  these  possibilities  in  detail, 
but  the  Bibliography  references  will  enable  the  reader  to  make  a  good 
start  at  studying  them  himself,  if  he  likes.  In  some  ways  it  is  reassuring 
that  progress  in  artificial  intelligence  research  is  proceeding  at  a  rela¬ 
tively  slow  but  regular  pace.  It  should  be  at  least  a  decade  before  any 
of  these  possibilities  becomes  an  actuality,  which  will  give  us  some 
time  to  consider  in  more  detail  the  issues  involved. 

Even  if  all  these  advances  should  occur,  there  will  still  be  work  for 
people  to  do,  and  people  will  still  be  able  to  feel  pain  or  be  unhappy: 
There  will,  however,  be  more  time  for  people  to  do  other  things  than 
work,  and  less  cause  for  them  to  suffer  needlessly.  So,  the  harvest  of 
artificial  intelligence  may  be  for  the  good  of  humanity. 
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Figure  9-4.  Program  written  by  a  child.  (See  Papert,  1971.) 
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Augmented  state  transition  network 
298 

Automatic  programming  264-269 


Automation  397-398 
Automaton  33,  37,  42-48,  54,  298, 
345,  361 

Automobile  273-274,  394 
Axiom  34 

Back-up  evaluation  131,  144 
Behavior  2,  11,  33 
Benefits  of  artificial  intelligence  397- 
398 

Bidirectional  search  101 

Bits  (binary  digit)  58 

Blind  procedure  96,  375-378 

Book  learning  145 

Bounded  search  97-98 

Brain  4,  12-17,  21-24,  55-56,  61-62 

Breadth-first  search  96-97,  138,  233 

Bridge  117,  121,  154,  158 

Broca’s  area  25 

Calculus  297,310,315,328 
Cantorism  63 
CARPS  297,  305,  315 
Causality  79,  163,  254,  274  fol. 
Ceiling  function  58-59 
Cell  20-28,  55-56,  346-355,  382 
Cellular  automaton  (ABC)  78,  344— 
350,  357-365,  381,  385 
Certainty  79 
Chance  117-121 
Character  recognition  70 
Checkers  1-4,  88,  105-108,  117-128, 
138-146,  149,  165,  372,  396 
Cheshire  cat  350  fol. 
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Chess  1,  29,  60,  88,  108,  117-128, 
135-137,  146-149,  163-165,  385 
Chimpanzee  273,  280 
Choice  118,  162,  303 
Church’s  thesis  34,  44,  49,  216 
Classification  170 
Clause  223-226,  242-243,  270 
Coincidence  predicate  191 
Communication  106,  274,  334,  374 
Comparison  path  309 
Competence  67,  103,  107 
Competition  122 
Compiler  291 
Complexity  334,  388 
Comprehension  312 
Computer  1,  25-29,  51-62,  65-69, 
103-107,  117-122,  129,  162,  182 
Concept  272,  336-337 
Conditioning  1 1  fol. 

Configuration  348,  361-364,  389 
Conflict  353-354,  385 
Confusion-of-patents  Problem  111, 
270 

CONNIVER  210,  269 
Conservation  10 
Constructive  proof  253,  268 
Context  294 

Context-free  language  289-290 
Context-sensitive  language  289-290 
Contingent  formula  222 
Continuous  phenomenon  40-41 
Contouring  188 
Contraction  role  221 
Control  28-29,  45,  49,  81,  371-373, 
379,  395-397 

Conventional  machine  60 
Conversation  312-327,  329,  331 
Core  memory  59 
Counter  84 

Crossword  puzzle  338-340 
Crypt  addition  112,  340 
Cut  rule  221 
Cycle  60 

Dangers  of  artificial  intelligence  395- 
397 

Data  base  182,  239-242,  255-260, 

370,  397 

Data  structure  82,  109,  281,  370 
Decentralization  of  cities  398 
Decidable  277 
Decision  matrix  154 
Decision  theory  330 
Definition  plane  308-3 1 0 
Delay  359,  384 
Demon  selection  210-211 


DENDRAL  149,  216 
Denotation  79 
Density  function  331 
Department  of  Defense  397 
Depth-first  search  96-98,  115 
Description  53,  170,  203-209,  274, 
371,  378 
Desirability  79 
Determinancy  40 
Dictatorship  395-397 
Dictionary  310 
Difference  operator  75 
Difference  set  100 
Differential  equation  64n2- 1 0 
Digital  topology  192 
Discrete  phenomenon  40-45,  63-64, 
343 

Disk  memory  59 
DNA  21 

Dolphin  7,  30,  273,  337 
Dynamic  search  134-138 

Ecology  398 

Edge  enhancement  188-191 
Effector  system  388  fol. 
Electroconvulsive  shock  therapy  (ECS) 
18  fol. 

Electronic  battlefield  397 
Embedding  of  languages  291 
Emitter  274 
Encoding  function  365 
Engram  20 
Entropy  210 

Environment  9,  21,  68-69,  305 
Epistemological  adequacy  78,  165, 
253-254 

Equivalence  79,  291 
Evaluation  function  96,  100-101, 
105-106,  140,  143-144,  149 
Evolutionary  stagnation  344,  377 
Evolutionary  system  71,  344,  361- 
362,  368,  375-380 
Example  construction  268 
Existence  79 
Expansion  rule  221 
Extensible  language  272,  277,  291 
Eye  25-28,  177,  282-284,  210 

Failure  388 
FDS  77,  87,  102 
Feature  303 

Fewest-components  strategy  236-238 
15-Puzzle  85-87,  107-109,  245 
Finite  description  34-35,  44,  55,  88, 
62-65,  82-85,  100-104,  163 
Finite-state  machine  45-49,  298 


Index 

Formula  217-222 
Forward  pruning  134-137 
Frame  problem  255,  305 
Freedom  379 
Free  will  163 

Function  43-44,  49,  360,  364-365 
Functional  abstraction  106 
Fuzzy  logic  268n6-2 
Fuzzy  program  104,  257 

Game  1,  69-70,  75,  117-121,  159, 
293,  338-339 

Gatne  theory  118-119,  123,  163-164 
Game  tree  124-138  ^ 

General  game-playing  programs  70, 
159-162 

Generality  70,  75-77,  108-109,  159- 
162,  333 

Generalization  339 
Generation  94-106,  134-135,  294, 
305,312,375-376 
Generator  94-106,  135 
Geometry  34 
Gestalt  effect  329 
Global  analysis  122  fol. 

Gnomon  104  fol. 

GO  61,  117-128,  137,  149-151,  165, 
385 

Goal  72,  76,  81-85,97,  151 
xGPS  71-76,  87,  102 
k3QA  329,  340 

^Grammar  73,  277-279,  285-286,  302, 
^  312,  333,  337 

Grammatical  choice  303 
Grammatical  inference  73,  182,  272, 
328,  330-334,  339 

Graph  88,  110,  179,  249-252,  272, 
305-306,  345-350,  364,  380 
dependency  310 
description  203—206, 209 
refutation  231 
state  space  119 
Grid  346,352,357,364,382 
Group  345-361,  380 

Habituation  27 
HACKER  269n6-6 
Halting  problem  50,  56,  221 
Hand,  mechanical  388-390,  394 
Happiness  297,  305,  316  fol. 

Hearing  388 

Heisenberg’s  Uncertainty  Principle 
388 

Heuristic  adequacy  254-255 
Heuristic  block  152  fol. 
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Heuristic  search  67,  81,  117,  94-103, 
233-238,  312,  388 
Hex  122,  159 

Hierarchical  systems  38n2,  60,  210, 
260,  333,  344,  368,  378 
Hippocampus  19,  22 
Homogeneity  31 1 
Human  intelligence:  see  Intelligence 
(human) 

Hybrid  program  368 

Illuminator  182,  184  fol. 

Imaging  eye  186 
Imperfect  information  game  121 
Induced  response  27 
Inference  216,  220,  268n6-2 
Infinite  systems  221 
Infinity  63n2-5 

Information  21,  58  fol.,  74,  209  fol., 
273,  338,  340,  398 
Initial  state  50,  81,  346,  365 
Instant  Insanity  389 
Integrated  circuit  60-61 
Integration  294,312 
Intelligence 

human  1,  6,  62,  69,  117,  210,  240, 
272 

machine  1,  3  fol.,  55,  69,  328 
natural  2,  5,  30nl-3,  36,  55  fol,, 
70 

Intelligence  test  6 
Inter-Lingua  181 
Interpreter  291 
Introspection  7  fol. 

Invariance  10 
Invention  74 
IPL  67,  108n3-l 

Jigsaw  puzzle  210,  312 
Jumping-spot  eye  184 

Kalah  122,165 

King-and-the-Wizards  problem  270- 
271 

Knowledge  68,  79,  336,  344 
Korsikoff’s  syndrome  20 
Kriegsspiel  121,  163 

Language  67—81  passim,  107,  151, 
175,  215,  257,  286,  289-290,  377 
games  293,  338  fol. 
graph  333 

human  1,  6  fol.,  79,  291,  333 
learning  272,  312 
natural  274  fol.,  293,  331  fol. 
pattern  perception  180  fol.,  ,209 
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Language  {continued) 

programming  69,  281,  293,  312, 
335,  337,  389 

understanding  70,  215,  293  fol., 

304 

Lattices  109n3-4, 369 
Learning  7,  67,  103,  105,  143  fol., 
170,  312,  373,  378 
Leisure  time  397 
Level  of  competence  67,  103,  107, 

209 

Life  349,  380 
Light  7 

Line  detection  187-192,  210 
LISP  67,  108n3-l,  210,  264,  267 
List  structure  179,  264,  315,  333 
Logic  28  fol.,  61,  106,  149,  216-219, 
230,  254,  268n6-2,  269n6-6 
Loop  103 
LR(1)  338 

Mac  Hack  Six  147,  149 
Machine  36,  44,  163,  173,  193-194, 
274,  291,  343,  373-378 
fact  retrieving  68 
intelligence  1-4,  29nl-l,  30nl-l, 
69,  71,  328 
memory  51 
Machine  language  291 
Macro-operator  107 
Macro-state  107 

Mathematical  theory  1,  34-39  62 
73,  217 

Matrix  179,  186,  224-225 
Maze  of  Dedalus  1 10,  21 1 
Mealy  automaton  361 
Meaning  272-274,  304,  332,  339 
Means-end  analysis  75-76 
Mechanical  hand  388-390,  394 
Mechanistic  reasoning  63 
Medial  axis  transformation  192 
Memory  18,  19,  45,  51,  58-62,  64, 
102,  372 
MENS  310 
Mentalism  336-337 
Mesa  phenomenon  377 
Metamathematics  216,  268n6-l 
Metaphysical  adequacy  78,  166 
Micro-electrode  27 
Milepost  paradigm  104 
Minimax  analysis  123,  127-135 
Missionaries-and-Cannibals  Problem 
76,  107,  no,  270 
Modal  logic  268 

Model  67,  103,  106-107,  165,  220 
274,  304,  310,  336-337 


Monkey-and-Bananas  Problem  113, 
246-254,  389 

Moore  neighborhood  349 
Motivation  28,  396 
Move  118  fol. 

Multiple  102 
Multiprocess  343 
Music  337 
Mutation  377 

Mutilated  Checkerboard  Problem 
113,270 

Natural  intelligence  2,  5,  30nl-3,  36, 
55  fol.,  70 

Natural  language  274  fol.,  293,  331 
fol. 

Naval  Anthromorphic  Teleoperator 
389 

Near-miss  205 

Neighborhood  348  fol.,  351,  355,  357 
fol.,  363,  382 
Nerve  nets  334 
Neuron  12-28 

Next-move  function  49,  57,  60,  65 
(Ex.  2-4) 

NIM  122,  161 
Noise  removal  187  fol. 

Nonchance  121,  123,  137 
Nondeterminancy  40n-4,  104,  257 
268 

Nondiscrete  functions  40  fol.^  63  fol. 
Nonperiodicity  40n-4 
Nonzero  probability  57 

Object  recognition  75  fol.,  193-201, 
388 

Occurrence  functions  40,  63  fol. 
Operator  75  fol.,  81  fol.,  85  fol.,  89, 
93,  109n3-4,  192,  219  fol. 
Opponent-oriented  strategies  1 65n4-6 
Optical  character  recognition  173 
Optics  59,  61,  171,  182,  184  fol. 
Orban’s  monkey  261,  262-263 
Ordering  strategies  100  fol.,  105,  233, 
236-238 

Pandemonium  27  fol.,  210n5-l,  372 
Paradigm  67  fol.,  103,  109n3-3,  215, 
332-333,  339,  372  fol. 

Parallel  process  38n-2,  343  fol. 
Parametric  function  139  fol. 

Paranoia  323,  395 
PARRY  3,  323,  395 
Parse  295,  300 
Partial-function  logic  269n6-6 
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Partial  recursive  functions  49, 

108n3-3 

Partial  specification  12-1 'i,  74  fol.,  85 
Pattern  classification  172  fol.,  212 
(Ex.  5-*3) 

Pattern  matching  170,  180,  182,  212, 
(Ex.  5-3),  258,296 
Pattern  perception  182,  209  fol.,  332, 
394 

Pattern  recognition  70,  151,  178  fol., 
182,  212  (Ex.  5-3),  330-334,  378 
Patterns  68,  170,  332,  379,  387 
Payment  function  118  fol.,  122,  125, 
164 

Peg  solitaire  114  (Ex,  3-13) 

Pentamino  350 
Perception  182 

Perception  systems  182,  372,  388  fol. 
Perceptual  generalization  333,  339 
Perfect  information  game  118,  121 
Phenomenon  78,  348,  378 
Phrase-structure  language  286,  289- 
290 

Picture  enhancement  187-192 
Plan  104,  372 
Plane  geometry  107 
PLANNER  106,  181,  210,  257-264, 
302,  338,  344  fol.,  370  fol.,  388 
fol. 

Planning  67,  76,  103-105,  257 
Plausibility  ordering  79,  134  fol  ,  138 
Plausible  move  generator  147,  149 
Play  118 

Poker  1,  117,  121,  151-154,  165n4— 6, 

372 

Polycephalic  Turing  machine  53,  65 
(Ex.  2-4) 

Polynomial  evaluation  function  143 
fol. 

POP  command  299 
Positional  game  159 
Predicate  calculus  81,  233,  244,  252 
foL,  261,  305,  310 
in  state-space  problems  245  fol. 
in  theorem  proving  215-252 
Probability  58,  79,  100,  118,  297, 

316  fol.,  339 
Problem 

definition  68,  71-72 
domain  9,  69,  305,  330,  335 
recognition  339 
reduction  88-93 
representation  67,  103,  107,  295 
solving  1,  5,  8,  65,  293,  329,  344, 

373 

tree  101,  123 


Programmer  108n3— 1,  298,  302,  328, 
338 

Proposition  219 
Protocol  328  fol.,  330,  379 
PROW  267 
PUSH  command  299 
Pushdown  store  300 
Puzzle  85 

QA3  241,  267 
QA4  210,  344 

Quantum  theory  61,  63n2— 5,  78 
Quench  point  193 

Question-answering  68,  105,  268n6-3, 
297,  328,  335,  378  fol. 

Question  tennis  338 

Reasoning  67,  103,  119,  215  fol., 
238-244 

Reasoning  checkers  396 
Reasoning  program  71,  77—80,  368, 

378 

Recognition  201—209,  272,  274,  378 
Recursion  10,  49,  103,  108n3— 3,  151, 
292,  297,  333 
Redundancy  278,  292 
REF-ARF  77n2,  102,  104 
Refinement  strategy  233-236 
Reflectance  184 

Refutation  graph  231  foL,  235,  237 
fol.,  249,  250  fol. 

Region  192  fol.,  195  fol.,  200 
Register  299 
Relativity  theory  63n2-6 
Representation  77,  81,  102,  107,  171, 
295 

Resolution  procedure  221  fol.,  232, 
248  fol.,  270(Ex.6-4),  388  fol. 
Risch  algorithm  95 
RNA  21,  276 

Robot  175,  177  fol.,  255,  270(Ex.6-8), 
368,  372  fol.,  387-393,  396,  398 
Rote  learning  143  fol. 

Rules  118,  216,  220,  221-222,  272, 
312,  328 

SAD-SAM  305 
SAINT  90,  91  fol.,  110n3-7 
San  Diego  Problem  103,  114-115 
Scene  analysis  187—201 
Scientific  method  35-36 
Scrabble  338 

Search  technique  67,  94,  117,  388 
SEE  194  fol.,  199  fol. 

Self-affecting  program  106,  163n4-l, 
361-368 
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Self-description  344 
Self-improvement  335  fol.,  344,  366 
Self  model  304,  336 
Self-organizing  system  71,  329,  344, 
362,  368,  372 

Self-reference  273,  278,  292 
Self-repair  362,  397 
Self-reproducing  system  344,  348, 

361  fol.,  363-368,  397 
Semantic  information  3,  62,  209,  304, 
273,340 

Semantic  memory  306,  321  fpl. 
Sensory  information  storage  (SIS)  17 
Sentence  82,  272  fol.,  303 
Sequence  prediction  173  fol. 

Sets  49,  62n2-4,  n2-5,  82,  203,  331 
Seven  Bridges  of  Konisberg  Problem 
76  fol. 

Shallow  search  134  fol. 

SHRDLU  304,  311  fol.,  324-327 
Signature  table  141  fol.,  145  fol. 
Simplification  strategy  233-234 
Simulation  4,  28-29,  30nl-3,  69  fol., 
266,  330,  359 

SIN  91,  93,  110n3-7,  216 
SIR  311,318-320 
Situation  119,  294  fol. 

Situational  fluent  245  fol. 
Situation-space  problem  72,  80-94 
Skolem  function  224  fol.,  251 
Sliding  Block  Puzzle  113  (Ex.  3-10) 
Smoothing  187 
Solution  72,  74  fol.,  81,  83-85 
Speech  293 
Spider  automata  384 
Stagnation  344,  337 
State  48,  54,  58,  82  fol.,  85,  348,  357, 
358 

initial  346,  365 

State-space  problem  67,  80-94,  117, 
152,  215  244-248,  255,  396 
Static-evaluation  function  129-131, 

139  fol. 

Step  function  40,  42-44,  63n2-8 
Strategy  72,  118-119,  233-238,  264, 
300 

Strategically  isomorphic  162 
String  42-44,  53  fol.,  73,  179,  218, 
272,  337-340,  364,  369 
STRIPS  255  fol., 

Structure  175,  182,  201-209,  243, 
257,  333,  370 

Student  296,  305,  313,  fol.. 

Subgoal  89,  104 

Sunflower  pattern  171,  330  fol. 

Surround  inhibition  25  fol. 


Symbol  34,  48,  51,  217,  245,  272,  354 
dummy  54 
logic  219  fol. 
predicate  calculus  216 
Symbolic  integration  90,  95,  107,  110 
Synapse  12  fol.,  15  fol.,  21 
Synergy  329 
Syntactic  ambiguity  295 
Syntax  281, 294  fol.,  297 
System  221,  344,  361-368,  375-379, 
380,  388  fol. 
self-describing  380 
self-organizing  329,  344,  368,  372, 
380 

self-reproducing  380 
Systemic  grammar  277,  302 

Tactile  perception  388-389 
Tautology  233 
Teaching  335,  398 
Technological  systems  398 
Technology  394 
Teleoperator  389 
Template  180  fol.,  242,  296 
Temporality  79 
Tesselation  automaton  381 
Theorem  proving  70,  75,  106,  215, 
221,  222-223,  304,  388  fol. 
Theoretical  parallel  machines  60  fol. 
Theoretical  serial  machines  60  fol. 
Three  Coins  Problem  80  fol.,  83-84, 
88,101 

Tic-Tac-Toe  122,  159,  160  fol., 
163n4-l 

Time  37-40,  254,  370 
Tools  394 

Toroidal  grid  346,  352 
Tower  of  Hanoi  Problem  76,  113 
(Ex.  3-9) 

Toy  problem  389 
Training  sequence  335  fol. 
Transformational  grammar  277,  312 
Transition  function  45,  298,  360,  364 
Translator  291,  338,  378,  398 
Traveling-Salesman  Problem  85,  112 
(Ex.  3-4) 

Tree  search  134,  135-136,  146,  149 
365,  369,  388  ’ 

Tromino  364 

Turing  machine  44  fol.,  48-51,  174, 
282,  298,  340,  350-354,  350-359 
patterns  173  fol. 
polycephalic  53,  65  (Ex.  2-4) 
sequence  prediction  175 
system  inference  73 
universal  53-55,  74 
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Turing’s  test  2  fol.,  31nl— 4,  68 
Turing’s  thesis  34,  44,  49,  56 

Unary  coding  49-50,  53 
Uncertainty  Principle  388 
Undecidability  theory  2,  3,  221 
Understanding  274,  336 
Unification  procedure  226-229 
Unit-preference  strategy  236  fol. 
Universal  constructor  362,  364,  384 
fol. 

Universal  language  279,  285,  337 
Universal  operator  220 
Universal  Turing  machine  53-55,  74 
Universe  of  discourse  218 
Unsolvable  problem  56,  76 

Value  systems  396 
Variable  analogy  242-243 
Variable  prefix  180 


Variable-valued  reasoning  378 
Venn  diagram  240 
Verification  180 
Vertex  142,  179,  194,  199 
Visual  patterns  1,  25  fol.,  28,  182, 
202,  331  fol.,  388  fol. 

Von  Neumann  neighborhood  363 

Water-jug  Problems  112—113  (Ex. 
3-6) 

Web  language  333 
Wernecke’s  area  24 
Whiskers  389 
Whistle  language  337 
Will  379 
World  system  398 

Zero  delay  359 

Zero-sum  games  122  fol.,  137 

Zorba  241,  242  fol. 
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LASERS  AND  HOLOGRAPHY,  Winston  E.  Kock.  Sound  introduction  to 
burgeoning  field,  expanded  (1981)  for  second  edition.  84  illustrations.  160pp.  55^  x 
8‘4.  (EUK)  24041-X  Pa.  13.50 

FLORAL  STAINED  GLASS  P AXTERN  BOOK,  Ed  Sibbett,  Jr.  96  exquisite  floral 
patterns— irises,  poppie,  lilies,  tulips,  geometries,  abstracts,  etc,— adaptable  to 
innumerable  stained  glass  projects.  64pp.  8!4  x  M.  24259-5  Pa.  |3.50 

THE  HISTORY  OF  THE  LEWIS  AND  CLARK  EXPEDITION,  Meriwether 
Lewis  and  William  Clark.  Edited  by  Eliott  Coues.  Great  classic  edition  of  Lewis  and 
Clark’s  day-by-day  journals.  Complete  1893  edition,  edited  by  Eliott  Coues  from 
Biddle’s  authorized  1814  history.  1508pp.  5%  x 

21268-8,  21269-6,  21270-X  Pa.  Three-vol.  set  $22.50 

ORLE  Y  FARM,  Anthony  Trollope.  Three-dimensional  tale  of  great  criminal  case. 
Original  Millais  illustrations  illuminate  marvelous  panorama  of  Victorian  society. 
Plot  was  author’s  favorite.  736pp.  5?^  x  24181-5  Pa.  $10.95 

THE  CLAVERINGS,  Anthony  Trollope.  Major  novel,  chronicling  aspects  of 
British  Victorian  society,  personalities.  16  plates  by  M.  Edwards;  first  reprint  of  full 
text.  412pp.  5%  X  8^.  23464-9  Pa.  $6.00 

EINSTEIN’S  THEORY  OF  RELATIVITY,  Max  Born.  Finest  semi-technical 
account;  much  explanation  of  ideas  and  math  not  readily  available  elsewhere  on 
this  level.  376pp.  5%  x  8J^.  60769-0  Pa.  |5.00 

COMPUTABILITY  AND  UNSOLVABILITY,  Martin  Davis.  Classic  graduate- 
level  introduction  th  theory  of  computability,  usually  referred  to  as  theory  of 
recurrent  functions.  New  preface  and  appendix.  288pp.  5%  x  8J^.  61471-9  Pa,  |6.50 

THE  GODS  OF  THE  EGYPTIANS,  E.A.  Wallis  Budge.  Never  excelled  for 
richness,  fullness;  all  gods,  goddesses,  demons,  mythical  figures  of  Ancient  Egypt; 
their  legends,  rites,  incarnations,  etc.  Over  225  illustrations,  plus  6  color  plates. 
988pp.  6%  X  9%,  (EBE)  22055-9,  22056-7  Pa.,  Two-vol.  set  |20.00 

THE  I  CHING  (THE  BOOK  OF  CHANGES),  translated  by  James  Legge.  Most 
penetrating  divination  manual  ever  prepared.  Indispensable  to  study  of  early 
Oriental  civilizations,  to  modem  inquiring  reader,  448pp.  5%  x  8^. 

21062-6  Pa.  $6.50 

THE  CRAFTSMAN’S  HANDBOOK,  Cennino  Cennini.  15th-century  handbook, 
school  of  Giotto,  explains  applying  gold,  silver  leaf;  gesso;  fresco  painting, 
grinding  pigments,  etc.  142pp.  6%  x  9%.  20054-X  Pa.  |3.50 

AN  ATLAS  OF  ANATOMY  FOR  ARTISTS,  Fritz  Schider.  Finest  text,  working 
book.  Full  text,  plus  anatomical  illustrations;  plates  by  great  artists  showing 
anatomy,  593  illustrations.  192pp.  7%  x  lOi^.  20241-0  Pa.  $6.50 

EASY-TO-MAKE  STAINED  GLASS  LIGHTCATCHERS,  Ed  Sibbett,  Jr.  67 
designs  for  most  enjoyable  ornaments;  fruits,  birds,  teddy  bears,  trumpet,  etc.  Full 
size  templates.  64pp.  8M  x  11.  24081-9  Pa.  $3.95 

TRIAD  OPTICAL  ILLUSIONS  AND  HOW  TO  DESIGN  THEM,  Harry  Turner. 
Triad  explained  in  32  pages  of  text,  with  32  pages  of  Escher-like  patterns  on 
coloring  stock.  92  figures.  32  plates.  64pp.  8^x  11.  23549-1  Pa.  $2.95 


CATALOG  OF  DOVER  BOOKS 

SMOCKING:  TECHNIQUE;  PROJECTS,  AND  DESIGNS,  Dianne  Durand. 
Foremost  smocking  designer  provides  complete  instructions  on  how  to  smock. 
Over  10  projects,  over  100  illustrations.  56pp.  S%  x  11.  23788-5  Pa,  $2.00 

AUDUBON’S  BIRDS  IN  COLOR  FOR  DECOUPAGE,  edited  by  Eleanor  H. 
Rawlings  24  sheets,  37  most  decorative  birds,  full  color/ on  one  side  of  paper. 
Instructions,  including  work  under  glass.  56pp.  SVa  x  11.  23492-4  Pa.  $3.95 

THE  COMPLETE  BOOK  OF  SILK  SCREEN  PRINTING  PRODUCTION,  J.I. 
Biegeleisen.  For  commercial  user,  teacher  in  advanced  classes,  serious  hobbyist. 
Most  modern  techniques,  materials,  equipment  for  optimal  results.  124  illus¬ 
trations.  253pp.  5^  X  SVz.  i  21 100-2  Pa.  $4.50 

A  TREASURY  OF  ART  NOUVEAU  DESIGN  AND  ORNAMENT,  edited  by 
Carol  Belanger  Grafton.  577  designs  for  the  practicing  artist.  Full-page,  spots, 
borders,  bookplates  by  Klimt,  Bradley,  others.  144pp.  S%  x  im.  24001-0  Pa.  $5.95 

ART  NOUVEAU  TYPOGRAPHIC  ORNAMENTS,  Dan  X.  Solo.  Over  800  Art 
Nouveau  florals,  swirls,  women,  animals,  borders,  scrolls,  wreaths,  spots  and 
dingbats,  copyright- free.  100pp.  8/4  x  11.  24366-4  Pa.  $4.00 

HAND  SHADOWS  TO  BE  THROWN  UPON  THE  WALL,  Henry  Bursill. 
Wonderful  Victorian  novelty  tells  how  to  make  flying  birds,  dog,  goose,  deer,  and  14 
others,  each  explained  by  a  full-page  illustration.  32pp.  6H  x  9M.  21779-5  Pa.  $1.50 

AUDUBON’S  BIRDS  OF  AMERICA  COLORING  BOOK,  John  James  Audubon. 
Rendered  for  coloring  by  Paul  Kennedy.  46  of  Audubon’s  noted  illustrations: 
red-winged  black-bird,  cardinal,  etc.  Original  plates  reproduced  in  full-color  on  the 
covers.  Captions.  48pp.  8/4x11.  23049-X  Pa.  $2.25 

SILK  SCREEN  TECHNIQUES,  J.I.  Biegeleisen^  M. A.  Cohn.  Clear,  practical, 
modern,  economical.  Minimal  equipment  (self-built),  materials,  easy  methods.  For 
amateur,  hobbyist,  1st  book.  141  illustrations.  185pp.  6!4  x  9M.  20433-2  Pa.  $3.95 

101  PATCHWORK  PATTERNS,  Ruby  S.  McKim.  101  beautiful,  immediately 
useable  patterns,  full-size,  modern  and  traditional.  Also  general  information, 
estimating,  quilt  lore.  140  illustrations.  124pp.  7%x  20773-0  Pa.  $3.50 

READY-TO-U^E  floral  DESIGNS,  Ed  Sibbett,  Jr.  Over  100  floral  designs 
(most  in  three  size^y  of  popular  individual  blossoms  as  well  as  bouquets,  sprays, 
garlands.  64pp.  8^/4  X  1 1 .  23976-4  Pa.  $2.95 

AMERICAN  "WILD  FLOWERS  COLORING  BOOK,  Paul  Kennedy.  Planned 
coverage  of  46  most  important  wildf lowers,  from  Rickett  s  collection;  instructive  as 
well  as  entertaining.  Color  versions  on  covers.  Captions.  48pp.  SVi  x  H, 

20095-7  Pa.  $2.50 

carving  DUCK  DECOYS,  Harry  V.  Shourds  and  Anthony  Hillman.  Detailed 
instructions  and  full-size  templates  for  constructing  16  beautiful,  marvelously 
oractical  decoys  according  to  time-honored  South  Jersey  method.  70pp.  d'A  x  \2yi. 
^  24083-5  Pa.  $4.95 

TRADITIONAL  PATCHWORK  PATTERNS,  Carol  Belanger  Grafton.  Card¬ 
board  cut-out  pieces  for  use  as  templates  to  make  12  quilts:  Buttercup,  Ribbon 
Border,  Tree  of  Paradise,  nine  more.  Full  instructions.  57pp.  8M  x  I  L 

23015-5  Pa.  $3.50 
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25  KITES  THAT  FLY,  Leslie  Hunt.  Full,  easy-to-follow  instructions  for  kites 
made  from  inexpensive  materials.  Many  novelties.  70  illustrations.  1 10pp.  5%  x  8H. 

22550-X  Pa.  $2.25 

PIANO  TUNING,  J.  Cree  Fischer.  Clearest,  best  book  for  beginner,  amateur. 
Simple  repairs,  raising  dropped  notes,  tuning  by  easy  method  of  flattened  fifths.  No 
previous  skills  needed.  4  illustrations.  201pp.  5%  x  SH.  23267-0  Pa.  $3.50 

EARLY  AMERICAN  IRON-ON  TRANSFER  PATTERNS,  edited  by  Rita  Weiss. 
75  designs,  borders,  alphabets,  from  traditional  American  sources.  48pp.  8i4  x  H. 

23162-3  Pa.  $1.95 

CROCHETING  EDGINGS,  edited  by  Rita  Weiss.  Over  100  of  the  best  designs  for 
these  lovely  trims  for  a  host  of  household  items.  Complete  instructions,  illustra¬ 
tions.  48pp.  8'^  X  11.  24031-2  Pa.  $2.25 

FINGER  PLAYS  FOR  NURSERY  AND  KINDERGARTEN,  Emilie  Poulsson.  18 
finger  plays  with  music  (voice  and  piano);  entertaining,  instructive.  Counting, 
nature  lore,  etc.  Victorian  classic.  53  illustrations.  80pp.  eVz  x  9^4.22588-7  Pa.  $1.95 

BOSTON  THEN  AND  NOW,  Peter  Vanderwarker.  Here  in  59  side-by-side  views 
are  photographic  documentations  of  the  city’s  past  and  present.  1 19  photographs. 
Full  captions.  122pp.  SH  x  n.  24312-5  Pa.  $6.95 

CROCHETING  BEDSPREADS,  edited  by  Rita  Weiss.  22  patterns,  originally 
published  in  three  instruction  books  1939-41.  39  photos,  8  charts.  Instructions. 
48pp.  84  X  11.  23610-2  Pa.  $2.00 

HAWTHORNE  ON  PAINTING,  Charles  W.  Hawthorne.  Collected  from  notes 
taken  by  students  at  famous  Cape  Cod  School;  hundreds  of  direct,  personal  apercus, 
ideas,  suggestions.  91pp.  5%  x  8^4.  20653-X  Pa.  $2.50 

THERMODYNAMICS,  Enrico  Fermi.  A  classic  of  modern  science.  Clear,  organ¬ 
ized  treatment  of  systems,  first  and  second  laws,  entropy,  thermodynamic  poten¬ 
tials,  etc.  Calculus  required.  160pp.  5)^  x  84.  60361 -X  Pa.  $4.00 

TEN  BOOKS  ON  ARCHITECTURE,  Vitruvius.  The  most  important  book  ever 
written  on  architecture.  Early  Roman  aesthetics,  technology,  classical  orders,  site 
selection,  all  other  aspects.  Morgan  translation.  331pp.  5%  x  84.  20645-9  Pa.  $5.50 

THE  CORNELL  BREAD  BOOK,  Clive  M.  McCay  and  Jeanette  B.  McCay.  Famed 
high-protein  recipe  incorporated  into  breads,  rolls,  buns,  coffee  cakes,  pizza,  pie 
crusts,  more.  Nearly  50  illustrations.  48pp.  84  x  n.  23995-0  Pa.  $2.00 

THE  CRAFTSMAN’S  HANDBOOK,  Cennino  Cennini.  15th-century  handbook, 
school  of  Giotto,  explains  applying  gold,  silver  leaf;  gesso;  fresco  painting, 
grinding  pigments,  etc.  142pp.  64  x  94.  20054-X  Pa.  $3.50 

FRANK  LLOYD  WRIGHT’S  FALLINGWATER,  Donald  Hoffmann.  Full  story 
of  Wright’s  masterwork  at  Bear  Run,  Pa.  100  photographs  of  site,  construction,  and 
details  of  completed  structure.  112pp.  94  x  10.  23671-4  Pa.  $6.95 

OVAL  STAINED  GLASS  PATTERN  BOOK,  C.  Eaton.  60  new  designs  framed  in 
shape  of  an  oval.  Greater  complexity,  challenge  with  sinuous  cats,  birds,  mandalas 
framed  in  antique  shape.  64pp.  84  x  n.  24519-5  Pa.  $3.50 
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CHILDREN’S  BOOKPLATES  AND  LABELS,  Ed  Sibbett,  Jr.6  each  of  12  types 
based  on  Wizard  of  Oz,  Alice,  nursery  rhymes,  fairy  tales.  Perforated;  full  color. 
24pp.  8^4  X  11.  23538-6  Pa.  $3.50 

READY-TO-USE  VICTORIAN  COLOR  STICKERS:  96  Pressure-Sensitive  Seals, 
Carol  Belanger  Grafton.  Drawn  from  authentic  period  sources.  Motifs  include 
heads  of  men,  women,  children,  plus  florals,  animals,  birds,  more.  Will  adhere  to 
any  clean  surface.  8pp.  8/^  x  H.  24551-9  Pa.  $2.95 

CUT  AND  FOLD  PAPER  SPACESHIPS  THAT  FLY,  Michael  Grater.  16 
colorful,  easy-to-build  spaceships  that  really  fly.  Star  Shuttle,  Lunar  Freighter,  Star 
Probe,  13  others.  32pp.  8^4  x  1 1.  23978-0  Pa.  $2-50 

CUT  AND  ASSEMBLE  PAPER  AIRPLANES  THAT  FLY,  Arthur  Baker.  8 
aerodynamically  sound,  ready- to-build  paper  airplanes,  designed  with  latest 
techniques.  Fly  Pegasus,  Daedalus,  Songbird,  5  other  aircraft.  Instructions-  32pp. 
914  X  1  Pi.  24302-8  Pa.  $3.95 

SIDELIGHTS  ON  RELATIVITY,  Albert  Einstein.  Two  lectures  delivered  in 
1920-21:  Ether  and  Relativity  and  Geometry  and  Experience.  Elegant  ideas  in 
non-mathematical  form.  56pp.  x  8^.  2451 1-X  Pa.  $2.25 

FADS  AND  FALLACIES  IN  THE  NAME  OF  SCIENCE,  Martin  Gardner.  Fair, 
witty  appraisal  of  cranks  and  quacks  of  science:  Velikovsky,  orgone  energy,  Bridey 
Murphy,  medical  fads,  etc.  373pp.  5%  x  8^.  20394-8  Pa.  $5.95 

VACATION  HOMES  AND  CABINS,  U.S.  Dept,  of  Agriculture.  Complete  plans 
for  16  cabins,  vacation  homes  and  other  shelters.  105pp.  9  x  12.  23631-5  Pa.  $4.95 

HOW  TO  BUILD  A  WOOD-FRAME  HOUSE,  L.O.  Anderson.  Placement, 
foundations,  framing,  sheathing,  roof,  insulation,  plaster,  finishing — almost 
everything  else.  179  illustrations.  223pp.  7%  x  10^.  22954-8  Pa.  $5.50 

THE  MYSTERY  OF  A  HANSOM  CAB,  Fergus  W.  Hume.  Bizarre  murder  in  a 
hansom  cab  leads  to  engrossing  investigation.  Memorable  characters,  rich  atmo¬ 
sphere.  19th-century  bestseller,  still  enjoyable,  exciting.  256pp.  b%  x  8. 

21956-9  Pa.  $4.00 

MANUAL  OF  TRADITIONAL  WOOD  CARVING,  edited  by  Paul  N.  Hasluck. 
Possibly  the  best  book  in  English  on  the  craft  of  wood  carving.  Practical 
instructions,  along  with  1,146  working  drawings  and  photographic  illustrations. 
576pp.  6J^  X  9*74.  23489-4  Pa.  $8.95 

WHITTLING  AND  WOODCARVING,  E.J  Tangerman.  Best  book  on  market; 
clear,  full.  If  you  can  cut  a  potato,  you  can  carve  toys,  puzzles,  chains,  etc.  Over  464 
illustrations.  293pp.  5%  x  8^.  20965-2  Pa.  $4.95 

AMERICAN  TRADEMARK  DESIGNS,  Barbara  Baer  Capitman.  732  marks,  logos 
and  corporate-identity  symbols.  Categories  include  entertainment,  heavy  industry, 
food  and  beverage.  All  black-and-white  in  standard  forms.  160pp.  x  11. 

23259-X  Pa;  $6.95 

DECORATIVE  FRAMES  AND  BORDERS,  edited  by  Edmund  V.  Gillon,  Jr. 
Largest  collection  of  borders  and  frames  ever  compiled  for  use  of  artists  and 
designers.  Renaissance,  neo-Greek,  Art  Nouveau,  Art  Deco,  to  mention  only  a  few 
styles.  396  illustrations.  192pp.  8%  x  11^,  22928-9  Pa.  $6.00 
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J^EEDER,  Edgar  Wallace.  Eight  suspenseful 

JZf  f  p®  o"®  Ewtures  the  donnish  Mr.  J.G. 

Reeder  of  Public  Prosecutor’s  Office.  128pp.  5%  x  8)i.  (Available  in  U.S.  only) 

24374-5  Pa.  $3.50 

ANNE  ORR’S  CHARTED  DESIGNS,  Anne  Orr.  Best  designs  by  premier 
needlework  designer,  all  on  charts:  flowers,  borders,  birds,  children,  alphabets  etc 
Over  100  charts,  10  m  color.  Total  of  40pp.  8'i  x  1 1.  23704-4  Pa.  $2.50 

n^T?J^r^*^^^^EUGTION  TECHNIQUES  FOR  HOUSES  AND  SMALL 
BUILDINGS  SIMPLY  EXPLAINED,  U.S.  Bureau  of  Naval  Personnel.  Grading 
masonry,  woodworking,  floor  and  wall  framing,  roof  framing,  plastering  tile 
setting,  much  more.  Over  675  illustrations.  568pp.  6!i  x  91^.  20242-9  Pa.  $8.95 

MATISSE  LINE  DRAWINGS  AND  PRINTS,  Henri  Matisse.  Representative 
mia female  nudes,  faces,  still  lifes,  experimental  works,  etc.,  from  1898  to 
1948.  50  illustrations.  48pp.  8?^  x  l  l]i,  23877-6  Pa  $2  50 

HOW  TO  PLAY  THE  CHESS  OPENINGS,  Eugene  Znosko-Borovsky.  Clear 
profound  examinations  of  just  what  each  opening  is  intended  to  do  and  how 
Opponent  can  counter.  Many  sample  games.  147pp.  5%  x  22795-2  Pa.  $2.95 

DUPLICATE  BRIDGE,  Alfred  Sheinwold.  Clear,  thorough,  easily  followed 
account:  rules  etiquette,  scoring,  strategy,  bidding;  Goren’s  point-count  system, 
-DiacKwood  and  Gerber  conventions,  etc.  158pp.  5%  x  8^.  22741-3  Pa.  |3  00 

portrait  DRAWINGS,  J.S.  Sargent.  Collection  of  42  portraits 
reveals  technical  skill  and  intuitive  eye  of  noted  American  portrait  painter,  John 
Singer  Sargent.  48pp.  8>,i  X  11^.  24524-1  Pa!  $2.95 

ENTERTAINING  SCIENCE  EXPERIMENTS  WITH  EVERYDAY  OBJECTS 
Martin  Gardner.  Over  100  experiments  for  youngsters.  Will  amuse,  astonish  teach’ 
and  entertain.  Over  100  illustrations.  127pp.  514  x  8)4.  24201-3  Pa.  $2.50 

TpDY  BEAR  PAPER  DOLLS  IN  FULL  COLOR:  A  Family  of  Four  Bears  and 
Their  Costumes,  Crystal  Collins,  A  family  of  four  Teddy  Bear  paper  dolls  and 
nearly  60  cut-out  costumes.  Full  color,  printed  one  side  only.  32pp.  9%  x  12>4. 

24550-0  Pa.  13.50 

NEW  CALLIGRAPHIC  ORNAMENTS  AND  FLOURISHES,  Arthur  Baker 
Unusual,  multi-useable  material:  arrows,  pointing  hands,  brackets  and  frames’ 
ovals,  swirls,  birds,  etc.  Nearly  700  illustrations.  80pp.  8%  x  ]l>4. 

24095-9  Pa.  $3.75 

DINOSAUR  DIORAMAS  TO  CUT  &  ASSEMBLE,  M.  Kalmenoff.  Twocomplete 
three-dimensional  scenes  in  full  color,  with  31  cut-out  animals  and  plants. 
Excellent  educational  toy  for  youngsters.  Instructions;  2  assembly  diagrams.  32pp. 

^  '  ^  24541-1  Pa.  $4.50 

SILHOUETTES:  A  PICTORIAL  ARCHIVE  OF  VARIED  ILLUSTRATIONS 
edited  by  Carol  Belanger  Grafton.  Over  600  silhouettes  from  the  1 8th  to  20th 
centuries.  Profiles  and  full  figures  of  men,  women,  children,  birds,  animals,  groups 
and  scenes,  nature,  ships,  an  alphabet.  144pp.  S%  x  IIM.  23781-8  Pa  $4  95 
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SURREAL  STICKERS  AND  UNREAL  STAMPS,  William  Rowe.  224  haunting, 
hilarious  stamps  on  gummed,  perforated  stock,  with  images  of  elephants,  geisha 
girls,  George  Washington,  etc.  16pp.  one  side.  8H  x  11.  24371-0  Pa.  $3.50 

GOURMET  KITCHEN  LABELS,  Ed  Sibbett,  Jr.  112  full-color  labels  (4  copies 
each  of  28  designs).  Fruit,  bread,  other  culinary  motifs.  Gummed  and  perforated. 
16pp.  24087-8  Pa.  $2.95 

PATTERNS  AND  INSTRUCTIONS  FOR  CARVING  AUTHENTIC  BIRDS, 
H.D.  Green.  Detailed  instructions,  27  diagrams,  85  photographs  for  carving  15 
species  of  birds  so  life-like,  they’ll  seem  ready  to  fly!  SVa  x  1 1.  24222-6  Pa.  $2.75 

FLATLAND,  E.  A.  Abbott.  Science-fiction  classic  explores  life  of  2-D  being  in  3-D 
world.  16  illustrations.  103pp.  5^  x  8.  20001-9  Pa.  $2.00 

DRIED  FLOWERS,  Sarah  Whitlock  and  Martha  Rankin.  Concise,  clear,  practical 
guide  to  dehydration,  glycefinizing,  pressing  plant  material,  and  more.  Covers  use 
of  silica  gel.  12  drawings.  32pp.  5^  x  8H.  21802-3  Pa.  $1.00 

EASY-TO-MAKE  CANDLES,  Gary  V.  Guy.  Learn  how  easy  it  is  to  make  all  kinds 
of  decorative  candles.  Step-by-step  instructions.  82  illustrations.  48pp.  8^/4  x  H. 

23881-4  Pa.  $2.50 

SUPER  STICKERS  FOR  KIDS,  Carolyn  Bracken.  128  gummed  and  perforated 
full-color  stickers:  GIRL  WANTED,  KEEP  OUT,  BORED  OF  EDUCATION, 
X-RATED,  COMBAT  ZONE,  many  others.  16pp.  8M  x  1 L  24092-4  Pa.  $2.50 

CUT  AND  COLOR  PAPER  MASKS,  Michael  Grater.  Clowns,  animals,  funny 
faces... simply  color  them  in,  cut  them  out,  and  put  them  together,  and  you  have  9 
paper  masks  to  play  with  and  enjoy.  32pp.  8M  x  1 1.  23171-2  Pa.  $2.25 

A  CHRISTMAS  CAROL:  THE  ORIGINAL  MANUSCRIPT,  Charles  Dickens. 
Clear  facsimile  of  Dickens  manuscript,  on  facing  pages  with  final  printed  text.  8 
illustrations  by  John  Leech,  4  in  color  on  covers.  144pp.  8%  x  im. 

20980-6  Pa.  $5.95 

CARVING  SHOREBIRDS,  Harry  V.  Shourds  &  Anthony  Hillman.  16  full-size 
patterns  (all  double-page  spreads)  for  19  North  American  shorebirds  with  step-by- 
step  instructions.  72pp.  9Vi  24287-0  Pa.  $4.95 

THE  GENTLE  ART  OF  MATHEMATICS,  Dan  Pedoe.  Mathematical  games, 
probability,  the  question  of  infinity,  topology,  how  the  laws  of  algebra  work, 
problems  of  irrational  numbers,  and  more.  42  figures.  143pp.  x  8^.  (EBE) 

22949T  Pa.  $3.50 

READY-TO-USE  DOLLHOUSE  WALLPAPER,  Katzenbach  &:  Warren,  Inc. 
Stripe,  2  floral  stripes,  2  allover  florals,  polka  dot;  all  in  full  color.  4  sheets  (350  sq. 
in.)  of  each,  enough  for  average  room.  48pp.  8^/^  x  H.  23495;9  Pa.  $2.95 

MINIATURE  IRON-ON  TRANSFER  PATTERNS  FOR  d6lLHOUSES, 
DOLLS,  AND  SMALL  PROJECTS,  Rita  Weiss  and  Frank  Fontani.  Over  100 
miniature  patterns:  rugs,  bedspreads,  quilts,  chair  seats,  etc.  In  standard  dollhouse 
size.  48pp.  8^i  X  11.  23741-9  Pa.  $1.95 

THE  DINOSAUR  COLORING  BOOK,  Anthony  Rao.  45  renderings  of  dinosaurs, 
fossil  birds,  turtles,  other  creatures  of  Mesozoic  Era.  Scientifically  accurate. 
Captions.  48pp.  SVi  x  11.  24022-3  Pa.  $2.50 
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JAPANESE  DESIGN  MOTIFS,  Matsuya  Co.  Mon,  or  heraldic  designs.  Over  4000 
typical,  beautiful  designs;  birds,  animals,  flowers,  swords,  fans,  geometries*  all 
beautifully  stylized.  213pp.  1 1^  x  22874-6  Pa.  17.95 

THE  TALE  OF  BENJAMIN  BUNNY,  Beatrix  Potter.  Peter  Rabbit’s  cousin  coaxes 
aim  back  into  Mr.  McGregor’s  garden  for  a  whole  new  set  of  adventures.  All  27 
full-color  illustrations.  59pp.  x  5^.  (Available  in  U.S.  only)  21102-9  Pa.  $1.75 

THE  TALE  OF  PETER  RABBIT  AND  OTHER  FAVORITE  STORIES  BOXED 
SET,  Beatrix  Potter.  Seven  of  Beatrix  Potter’s  best-loved  tales  including  Peter 
Rabbit  in  a  specially  designed,  durable  boxed  set.  4\i  x  5H.  Total  of  447pp.  158  color 
illustrations.  (Available  in  U.S.  only)  23903-9  Pa.  $10.80 


PRACTICAL  MENTAL  MAGIC,  Theodore  Annemann,  Nearly  200  astonishing 
teats  of  mental  magic  revealed  in  step-by-step  detail.  Complete  advice  on  staging, 
patter,  etc.  Illustrated.  320pp.  5%  x  8/^.  24426-1  Pa  $5  95 


CELEBRATED  CASES  OF  JUDGE  DEE  (DEE  GOONG  AN),  translated  by 
Robert  Van  Gulik.  Authentic  18th-century  Chinese  detective  novel*  Dee  and 
associates  solve  three  interlocked  cases.  Led  to  van  Gulik’s  own  stories  with  same 
characters.  Extensive  introduction.  9  illustrations.  237pp.  5^  x  8^. 

23337-5  Pa.  $4.50 


CUT  &  FOLD  EXTRATERRESTRIAL  INVADERS  THAT  FLY,  M.  Grater. 
St^e  your  own  lilliputian  space  battles.By  following  the  step-by-step  instructions 
and  explanatory  diagrams  you  can  launch  22  full-color  fliers  into  space.  36pp.  8M  x 

24478-4  Pa.  $2.95 


F  ^  assemble  VICTORIAN  HOUSES,  Edmund  V.  Gillon,  Jr.  Printed  in 
full  color  on  heavy  cardboard  stock,  4  authentic  Victorian  houses  in  H-O  scale: 
Itahan-style  Villa,  Octagon,  Second  Empire,  Stick  Style.  48pp.  x 

23849-0  Pa.  $3.95 


BEST  SCIENCE  FICTION  STORIES  OF  H.G.  WELLS,  H.G.  Wells.  Full  novel 

stories:  “The  Crystal  Egg,”  “Aepyornis  Island,” 
The  Strange  Orchid,”  etc.  303pp.  5^  x  8^^.  (Available  in  U.S.  only) 

21531-8  Pa.  $4.95 

TRADEMARK  DESIGNS  OF  THE  WORLD,  Yusaku  Kamekura.  A  lavish 
collection  of  nearly  700  trademarks,  the  work  of  Wright,  Loewy,  Klee  Binder 
hundreds  of  others.  160pp.  8^4  x  8.  (Available  in  U.S.  only)  24191-2  Pa.  $5.95 


THE  ARTIST’S  AND  CRAFTSMAN’S  GUIDE  TO  REDUCING,  ENLARGING 
AND  TRANSFERRING  DESIGNS,  Rita  Weiss.  Discover,  reduce,  enlarge  transfer 
designs  from  any  objects  to  any  craft  project.  12pp.  plus  16  sheets  special  graph 

24142-4  Pa.  $3.50 


TREASURY  OF  JAPANESE  DESIGNS  AND  MOTIFS  FOR  ARTISTS  AND 
CI^j^SMEN,  edited  by  Carol  Belanger  Grafton,  Indispensable  collection  of  360 
traditional  Japanese  designs  and  motifs  redrawn  in  clean,  crisp  black-and-white, 
copyright-free  illustrations.  96pp.  8!4  x  n.  24435-0  Pa  $3  95 
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CHANCERY  CURSIVE  STROKE  BY  STROKE,  Arthur  Baker.  Instructions  and 
illustrations  for  each  stroke  of  each  letter  (upper  and  lower  case)  and  numerals.  54 
full-page  plates.  64pp.  8M  x  11.  24278-1  Pa.  $2.50 

THE  ENJOYMENT  AND  USE  OF  COLOR,  Walter  Sargent.  Color  relationships, 
values,  intensities;  complementary  colors,  illumination,  similar  topics.  Color  in 
nature  and  art.  7  color  plates,  29  illustrations.  274pp.  5^  x  20944-X  Pa.  $4.95 

SCULPTURE  PRINCIPLES  AND  PRACTICE,  Louis  Slobodkin.  Step-by-step 
approach  to  clay,  plaster,  metals,  stone;  classical  and  modern.  253  drawings, 
photos.  255pp.  8/4  X  11.  22960-2  Pa.  $7.50 

VICTORIAN  FASHION  PAPER  DOLLS  FROM  HARPER’S  BAZAR,  1867-1898, 
Theodore  Menten.  Four  female  dolls  with  28  elegant  high  fashion  costumes, 
printed  in  full  color.  32pp.  9%  x  12*/4.  23453-3  Pa.  $3.50 

FLOPSY,  MOPSY  AND  COTTONTAIL:  A  Little  Book  of  Paper  Dolls  in  Full 
Color,  Susan  LaBelle.  Three  dolls  and  21  costumes  (7  for  each  doll)  show  Peter 
Rabbit’s  siblings  dressed  for  holidays,  gardening,  hiking,  etc.  Charming  borders, 
captions.  48pp  4U  x  24376-1  Pa.  $2.25 

NATIONAL  LEAGUE  BASEBALL  CARD  CLASSICS,  Bert  Randolph  Sugar.  83 
big-leaguers  from  1909-69  on  facsimile  cards.  Hubbell,  Dean,  Spahn,  Brock  plus 
advertising,  info,  no  duplications.  Perforated,  detachable.  16pp.  SVi  x  11,  ^ 

24308-7  Pa.  $2.95 

THE  LOGICAL  APPROACH  TO  CHESS,  Dr.  Max  Euwe,  et  al.  First-rate  text  of 
comprehensive  strategy,  tactics,  theory  for  the  amateur.  No  gambits  to  memorize 
just  a  clear,  logical  approach.  224pp.  5%  x  814.  24353-2  Pa.  $4.50 

MAGICK  IN  THEORY  AND  PRACTICE,  Aleister  Crowley.  The  summation  of 
the  thought  and  practice  of  the  century’s  most  famous  necromancer,  long  hard  to 
find  Crowley’s  best  book.  436pp.  5%  x  814.  (Available  in  U.S.  only) 

23295-6  Pa.  $6.50 

THE  HAUNTED  HOTEL,  Wilkie  Collins.  Collins’  last  great  tale;  doom  and 
destiny  in  a  Venetian  palace.  Praised  by  T.S.  Eliot.  127pp.  b%  x  814. 

24333-8  Pa.  $3.00 

ART  DECO  DISPLAY  ALPHABETS,  Dan  X.  Solo.  Wide  variety  of  bold  yet 
eleeant  lettering  in  handsome  Art  Deco  styles.  100  complete  fonts,  with  numerals, 
punctuation,  more.  104pp.  814  x  H.  24372-9  Pa.  $4.50 

CALLIGRAPHIC  ALPHABETS,  Arthur  Baker.  Nearly  150  complete  alphabets  by 
outstanding  contemporary.  Stimulating  ideas;  useful  source  for  unique  effects.,  154 
plates.  157pp.  814  x  IPL  21045-6  Pa.  $5.95 

ARTHUR  BAKER’S  HISTORIC  CALLIGRAPHIC  ALPHABETS,  Arthur 
Baker.  From  monumental  capitals  of  first-century  Rome  to  humanistic  cursive  of 
16th  century,  33  alphabets  in  fresh  interpretations.  88  plates.  96pp.  9  x  12. 

24054-1  Pa.  $4.50 

LETTIE  LANE  PAPER  DOLLS,  Sheila  Young.  Genteel  turn-of-the-century 
family  very  popular  then  and  now.  24  paper  dolls.  16  plates  in  full  color.  32pp.  914  x 
12^  24089-4  Pa.  $3.50 
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JSrc  postcards  in  full  color  from 

CLASSIC  POSTERS,  Hayward  and  Blanche  Cirker.  Ready-to-mail  postcards 
reproduced  from  rare  set  of  poster  art.  Works  by  Toulouse-Lautrec  Parrish 
Stetnlen,  Mucha,  Cheret,  others.  12pp.  8Kx  11.  24389-3  Pa.  |2  95 

FRUIT  KEY  AND  TWIG  KEY  TO  TREES  AND  SHRUBS,  William  M.  Harlow. 

s^cL"^  EasiW  u'sed  ‘^n  species;  twig  key  covers  160  deciduous 

pecies.  Easily  used.  Over  300  photographs.  126pp.  5%  x  81i.  20511-8  Pa.  J2.25 

LEONARDO  DRAWINGS,  Leonardo  da  Vinci.  Plants,  landscapes,  human  face 
and  figure,  etc.,  plus  studies  for  Sforza  monument.  Last  Supper,  more  60 
illustrations.  64pp.  814  XI li4.  23951-9  Pa.  |2. 75 

«n  u  baseball  cards,  edited  by  Bert  R.  Sugar.  98  classic  cards  on  heavy 
w  ’  2''’  fo’'  detaching.  Ruth,  Cobb,  Durocher,  DiMaggio,  H. 

Wagner,  99  others.  Rare  originals  cost  hundreds.  16pp.  8*4  x  ip  23498-3  Pa.  |3.25 

CANAm^^wnr^  CENTRAL  UNITED  STATES  AND 

CANADA,  William  M.  Harlow.  Best  one-volume  guide  to  140  trees  Full 
descriptions,  woodlore,  range,  etc.  Over  600  illustrations.  Handy  size.  288pp.  4Hx 

20395-6  Pa.  $3.95 

*^°*^*^^  ™  full  COLOR,  Tom  Tierney.  3  Judy 
Garland  paper  dolls  (teenager,  grown-up,  and  mature  woman)  and  30  gorgeous 
costumes  highlighting  memorable  career.  Captions.  32pp.  9!4  x  m. 

24404-0  Pa.  $3.50 

FULL  Cof  GR^  designs  OF  THE  BELLE  EPOQUE  PAPER  DOLLS  IN 
FULL  COLOR  Tom  Tierney.  Two  dolls  and  30  costumes  meticulously  rendered 

3200  9^X1  other  greats  late  Victorian  to  WWI. 

3/pp.  94  124.  24425-3  Pa.  $3.50 

FASHION  PAPER  DOLLS  FROM  GODEWS  LADY’S  BOOK,  1840-1854  Susan 

wT'fu"  “ix'  ^  t^tth  50  costumes.  Little  girl’s,  bridal, 

rioing,  bathing,  wedding,  evening,  everyday,  etc.  32pp.  9'4  x  12*4. 

23511-4  Pa.  $3.95 

THE  BOOK  OF  THE  SACRED  MAGIC  OF  ABRAMELIN  THE  MAGE 
translated  by  S.  MacGregor  Mathers.  Medieval  manuscript  of  ceremonial  magic’ 
Basic  document  in  Aleister  Crowley,  Golden  Dawn  groups.  268pp.  5%  x  8)f.  ' 

23211-5  Pa.  $5.00 

PETER  RABBIT  POSTCARDS  IN  FULL  COLOR;  24  Ready-to-Mail  Cards 
Susan  Whited  LaBelle.  Bunnies  ice-skating,  coloring  Easter  eggi,  making  valen- 
anes,  many  other  charming  scenes.  24  perforated  full-color  postcards,  each 
measuring  44  x  6,  on  coated  stock.  12pp.  9  x  12.  24617-5  Pa.  $2  95 

^fte7of^h”^f'?  STROKE,  A.  Baker.  Complete  guide  creating  each 

etter  of  the  alphabet  in  distinctive  Celtic  manner.  Covers  hand  position,  strokes 
pens,  inks,  paper,  more.  Illustrated.  48pp.  8*4  x  H.  24336-2  Pa  $2  50 
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KEYBOARD  WORKS  FOR  SOLO  INSTRUMENTS,  G.F.  Handel.  35  neglected 
works  from  Handel’s  vast  oeuvre,  originally  jotted  down  as  improvisations. 
Includes  Eight  Great  Suites,  others.  New  sequence.  174pp.  9X  x 

AMERICAN  LEAGUE  BASEBALL  CARD  CLASSICS,  Bert  Randolph  Sugar  82 
stars  from  1900s  to  60s  on  facsimile  cards.  Ruth,  Cobb,  Mantle,  Williams,  plus 
advertising,  info,  no  duplications.  Perforated,  detachable.  16pp.  SM^x^l  L 

A  TREASURY  OF  CHARTED  DESIGNS  FOR  NEEDLEWORKERS,  Georgia 
Gorham  and  Jeanne  Warth.  141  charted  designs;  owl,  cat  with  yarn,  tulips,  ipiano, 
spinning  wheel,  covered  bridge,  Victorian  house  and  many  j  “g 

DANISH  FLORAL  CHARTED  DESIGNS,  Gerda  Bengtsson.  Exquisite  collection 
of  over  40  different  florals:  anemone,  Iceland  poppy,  wild  ^uit,  pansies,  many 
others.  45  illustrations.  48pp.  SVi  X  11.  23957-8  Pa.  $  .7 

OLD  PHILADELPHIA  IN  EARLY  PHOTOGRAPHS  1839-1914,  Robert  F. 
Looney.  215  photographs:  panoramas,  street  scenes,  landmarks  President-elect 
Lincoln’s  visit,  1876  Centennial  Exposition,  much  more. 

PRELUDE  TO  MATHEMATICS,  W.W.  Sawyer.  Noted  mathematician’s  lively, 
stimulating  account  of  non-Euclidean  geometry,  matrices  deterrninants,  group 
theory,  other  topics.  Emphasis  on  novel,  striking  aspects. 

ADVENTURES  WITH  A  MICROSCOPE,  Richard  Headstrom.  59  adventures 
with  clothing  fibers,  protozoa,  ferns  and  lichens,  roots  and  leaves  much  ‘  2 

illustrations.  232pp.  5?f  x  81i.  23471-1  Pa.  |3.95 

IDENTIFYING  ANIMAL  TRACKS:  MAMMALS,  BIRDS,  AND  OTHER 
ANIMALS  OF  THE  EASTERN  UNITED  STATES,  Richard  Headstrom.  For 
hunters,  naturalists,  scouts,  nature-lovers.  Diagrams  of  $3:50 

cation.  128pp.  5%  X  8. 

VICTORIAN  FASHIONS  AND  COSTUMES  FROM  HARPER’S  BAZAR,  1867- 
1898  edited  by  Stella  Blum.  Day  costumes,  evening  wear,  sports  clothes,  shoes,  hats, 
other  accessories  in  over  1,000  detailed  engravings.  320pp.  914  x  ^ 

EVERYDAY  FASHIONS  OF  THE  TWENTIES  AS  PICTURED  IN  SEARS  AND 
OTHER  CATALOGS,  edited  by  Stella  Blum.  Actual  dress  of  the  Roaring 
Twenties,  with  text  by  Stella  Blum.  Over  750  illustrations,  gPP^^p'jO 

HALL  OF  FAME  BASEBALL  CARDS,  edited  by  Bert  Randolph  Sugan  Cy  Young, 
Ted  Williams,  Lou  Gehrig,  and  many  other  Hall  of  Fame  greats  on  92  full-color, 
detachable  reprints  of  early  baseball  cards.  No  duplication  of  p,  «  50 

Baseball  Cards.  iepp.m^U.  23624-2  Pa.  $4.50 

THE  ART  OF  HAND  LETTERING,  Helm  Wotzkow.  Course  in  hand  lettering, 
Roman,  Gothic,  Italic,  Block,  Script.  Tools,  proportions,  optical  aspects  indivi¬ 
dual  variation.  Very  quality  conscious.  Hundreds  of  specimens.  |20pP-^5*^ 
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THE  RIME  OF  THE  ANCIENT  MARINER,  Gustave  Dore,  S.T.  Coleridge 
Dore  s  finest  work.  34  plates  capture  moods,  subtleties  of  poem.  Full  text.  77pp.  m  x 

22305-1  Pa.  }4.95 

famous^‘?n  William  Blake.  The  first  and  most  popular  of  Blake’s 

j  Books,  in  a  facsimile  edition  reproducing  all  31  brightly 

colored  plates.  Additional  printed  text  of  each  poem.  64pp.  BVa  x  7. 

22764-2  Pa.  $3.50 

a?80)Tdftmn''nr'°"'  ™  INFORMATION  THEORY,  J.R.  Pierce.  Second 
(  980)  edition  of  most  impressive  non-technical  account  available.  Encoding 
entropy,  noisy  channel,  related  areas,  etc.  S20pp.  bYs  x  gJ/j.  24061-4  Pa.  |4.^ 

THE  DIVINE  PROPORTION.  A  STUDY  IN  MATHEMATICAL  BEAUTY 
proportion”  or  “golden  ratio’  in  poetry,  Pascal’s  triangle,’ 
philosophy,  psychology,  music,  mathematical  figures,  etc.  Excellent  bridge 
between  science  and  art.  58  figures.  185pp.  bYs  x  8^.  22254-3  Pa  |3  95 

M^i^rv  haniro^^  WALKING  GUIDE:  From  the  Battery  to  Wall  Street, 

Mary  T  Shapiro.  Superb  inexpensive  guide  to  historic  buildings  and  locales  in 
lower  Manhattan:  Trinity  Church,  Bowling  Green,  more.  Complete  Text-  m^s  36 
Illustrations.  48pp.  37.  x  g-U  24225-0  Pa  $Lo 

NEW  YORK  THEN  AND  NOW,  Edward  B.  Watson,  Edmund  V.  Gillon,  Jr  83 

lOT?’  if"‘  facing  pages  early  photographs  (1875-1925)  and 

1976  photos  by  Gillon.  172  illustrations.  171pp.  9!4  x  10.  23361-8  Pa.  |7.95 

HISTORIC  COSTUME  IN  PICTURES,  Braun  &  Schneider.  Over  1450  costumed 

pfarSp”™"” 

VICTORIAN  AND  EDWARDIAN  FASHION:  A  Photographic  Survey,  Alison 
Gernsheim  First  fashion  history  completely  illustrated  by  contemporary  photo- 
S'e^x  ^  1840-1914,  in  which  many  celebrities  appear. 

240pp.  6^  X9U  24205-6  Pa  16.00 

OTHER'^NEEDfF^'AFTg’’;^^^K^  COUNTED  CROSS-STITCH  AND 
^  NEEDLECRAFTS,  Lindberg  Press.  Charted  designs  for  45  beautiful 

needlecraft  projects  with  many  yuletide  and  wintertime  motifs.  48pp.  8Vi  x  H. 

24356-7  Pa.  $2.50 

CR  ^  for  COUNTED  CROSS-STITCH  AND  OTHER  NEEDLE- 

CRAFTS,  Carter  Houck.  101  authentic  charted  folk  designs  in  a  wide  array  of  lovely 
epresentations  with  many  suggestions  for  effective  use.  48pp.  814  x  1 1. 

24369-9  Pa.  $2.25 

FIVE  ACRES  AND  INDEPENDENCE,  Maurice  G.  Kains.  Great  back-to-the-land 
classic  explains  basics  of  self-sufficient  farming.  The  one  book  to  get  95 
Illustrations.  397pp.  514  x  8)4.  20974-1  pf  $4.95 

A  MODERN  HERBAL,  Margaret  Grieve.  Much  the  fullest,  most  exact,  most  useful 
compilation  of  herbal  material.  Gigantic  alphabetical  encyclopedia,  from  aconite 
to  zedoary,  gives  botanical  information,  medical  properties,  folklore,  economic 
uses,  and  rn^uch  else  Indispensable  to  serious  reader.  161  illustrations:  888pp.  6)4  x 
94.  (Available  in  U.S.  only)  22798-7,  22799-5  Pa.,  Two-vol.  set  $16  45 
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decorative  napkin  folding  for  beginners,  Lillian  Oppenheimer 

and  Natalie  Epstein.  22  dilferent  napkin  folds  in  the  shape  of  a  heart,  clown  s  hat 

love  knot,  etc.  63  drawings.  48pp.  8*4  X  11.  23797-4  Pa.  $1.95 

DECORATIVE  LABELS  FOR  HOME  CANNING,  PRESERVING,  AND 
OTHER  HOUSEHOLD  AND  GIFT  USES,  Theodore  Menten.  128  gummed, 
perforated  labels,  beautifully  printed  in  2  colors.  12  versions.  Adhere  to  metal,  glas^s^ 

wood,  ceramics.  24pp.  814  X  11.  a.  . 

EARLY  AMERICAN  STENCILS  ON  WALLS  AND  FURNITURE,  Janet  War¬ 
ing.  Thorough  coverage  of  19th-century  folk  art:  techniques,  artifacts  surviving 
specimens.  166  illustrations,  7  in  color.  147pp.  of  text.  1%  x  lOyi.  21906-2  Pa.  $9.95 

AMERICAN  ANTIQUE  WEATHERVANES,  A.B.  &  W.T.  Westervelt.  Extensively 
illustrated  1883  catalog  exhibiting  over  550  copper  weathervanes  and  finials. 
Excellent  primary  source  by  one  of  the  principal  manufacturers.^BMpp^  Pa  |3  95 

ART  STUDENTS’  ANATOMY,  Edmond  J.  Farris.  Long  favorite  in  art  schools. 

Basic  elements,  common  positions,  actions.  Full  text,  158 

8/4. 

BRIDGMAN’S  LIFE  DRAWING,  George  B.  Bridgman.  More  than  500  drawings 
and  text  teach  you  to  abstract  the  body  into  its  major  masses.  Also  specificareas  o 
anatomy.  192pp.. 6>4  x  9T  (EA)  22710-3  Pa.  $4.50 

COMPLETE  PRELUDES  AND  ETUDES  FOR  SOLO  PIANO,  Frederic  Chopin. 
All  26  Preludes,  all  27  Etudes  by  greatest  composer  of  piano  ™“*\^;^^“*oritative 
Paderewski  edition.  224pp.  9  x  12.  (Available  in  U.S.  only)  24052-5  Pa.  $7.50 

PIANO  MUSIC  1888-1905,  Claude  Debussy.  Deux  Arabesques,  Suite  Bergamesque, 
Masques,  1st  series  of  Images,  etc.  9  others,  in  corrected 

TEDDY  BEAR  IRON-ON  TRANSFER  PATTERNS,  Ted  Menten.  80  iron-on 
transfer  patterns  of  male  and  female  Teddys  in  a  wide  variety  of  activities,  poses 
sires.  48pp.  814  x  11.  24596-9  Pa.  $2.25 

A  PICTURE  HISTORY  OF  THE  BROOKLYN  BRIDGE,  M.J.  Shapiro.  Pro¬ 
fusely  illustrated  account  of  greatest  engineering  achievement  of  19th  century.  167 
rare  photos  8c  engravings  recall  construction,  human  drama. 
text.  122pp.  8U  X  11.  ■  ■ 

NEW  YORK  IN  THE  THIRTIES,  Berenice  Abbott.  Noted  photographers 

fascinating  study  shows  new  buildings  ^at  have  becorne  famous  and  old  sights 
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INTRODUCTION  TO 

ARTIFICIAL 

INTELLIGENCE 

Can  computers  think?  Can  they  use  reason  to  develop  their  own 
concepts,  solve  complex  problems,  play  games,  understand  our 
languages?  This  comprehensive  survey  of  artificial  intelligence — the 
study  of  how  computers  can  be  made  to  act  inteUigently — explores  these 
and  other  fascinating  questions. 

Introduction  to  Artificial  Intelligence  presents  an  introduction  to  the 
science  of  reasoning  processes  in  computers,  and  the  research  ap¬ 
proaches  and  results  of  the  past  two  decades.  Youll  find  lucid,  easy-to- 
read  coverage  of  problem-solving  methods,  representation  and  modek, 
game  playing,  automated  understanding  of  natural  languages,  heuristic 
search  theory,  robot  systems,  heuristic  scene  analysis  and  specific 
artificial-intelligence  accomplishments. 

Related  subjects  are  ako  included:  predicate-calculus  theorem  proving, 
machine  architecture,  psychological  simulation,  automatic  program¬ 
ming,  novel  software  techniques,  industrial  automation  and  much  more. 
A  supplementary  section  updates  the  original  book  with  major  research 
from  the  decade  1974-1984.  Abundant  illustrations,  diagrams  and 
photographs  enhance  the  text,  and  challenging  practice  exercises  at  the 
end  of  each  chapter  test  the  student’s  grasp  of  each  subject. 

The  combination  of  introductory  and  advanced  material  makes  Introduc¬ 
tion  to  Artificial  Intelligence  ideal  for  both  the  layman  and  the  student  of 
mathematics  and  computer  science.  For  anyone  interested  in  the  nature 
of  thought,  it  will  inspire  visions  of  what  computer  technology  might 
produce  tomorrow. 

Revised  and  enlarged  republication  of  the  work  first  published  by 
Petrocelli/Charter,  New  York,  1974.  Revised  Preface.  Extensive  notes 
updating  the  main  text.  Supplementary  Bibliography.  132  black-and- 
white  illustrations.  512pp.  ^  x  8^.  Paperbound. 
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